Chapter 3

Title: k-Nearest Neighbor Classication I


The Chapters 3 to 6 describe classification use cases and introduce the k-nearest neighbors (k-NN) and Naive Bayes learning algorithms. Chapter 3 applies k-NN for the evaluation of teaching assistants. In Chapter 4 k-NN is used to classify different glass types based on chemical components and the RapidMiner process is extended by Principal Component Analysis (PCA) to better pre-process the data and to improve the classification accuracy. Chapter 5 explains Naive Bayes as an algorithm for generating classification models and uses this modeling technique to generate a credit approval model to decide whether a credit loan for which a potential or existing customer applies should be approved or not, i.e. whether it is likely that the customer will pay back the credit loan as desired or not. Chapter 6 uses Naive Bayes to rank applications for nursery schools, introduces the RapidMiner operator for importing Excel sheets, and provides further explanations of Naive Bayes.

Table of Contents

3.1 Introduction
3.2 Algorithm
3.3 The k-NN Operator in RapidMiner
3.4 Dataset
3.4.1 Teacher Assistant Evaluation Dataset
3.4.2 Basic Information
3.4.3 Examples
3.4.4 Attributes
3.5 Operators in This Use Case
3.5.1 Read URL Operator
3.5.2 Rename Operator
3.5.3 Numerical to Binominal Operator
3.5.4 Numerical to Polynominal Operator
3.5.5 Set Role Operator
3.5.6 Split Validation Operator
3.5.7 Apply Model Operator
3.5.8 Performance Operator
3.6 Use Case
3.6.1 Data Import
3.6.2 Pre-processing
3.6.3 Renaming Attributes
3.6.4 Changing the Type of Attributes
3.6.5 Changing the Role of Attributes
3.6.6 Model Training, Testing, and Performance Evaluation

Dataset: Please download the dataset from the following location:

Processes (Chapters 3-6): Click here to download