Chapter 4

Title: k-Nearest Neighbor Classification II


The Chapters 3 to 6 describe classification use cases and introduce the k-nearest neighbors (k-NN) and Naive Bayes learning algorithms. Chapter 3 applies k-NN for the evaluation of teaching assistants. In Chapter 4 k-NN is used to classify different glass types based on chemical components and the RapidMiner process is extended by Principal Component Analysis (PCA) to better pre-process the data and to improve the classification accuracy. Chapter 5 explains Naive Bayes as an algorithm for generating classification models and uses this modeling technique to generate a credit approval model to decide whether a credit loan for which a potential or existing customer applies should be approved or not, i.e. whether it is likely that the customer will pay back the credit loan as desired or not. Chapter 6 uses Naive Bayes to rank applications for nursery schools, introduces the RapidMiner operator for importing Excel sheets, and provides further explanations of Naive Bayes.

Table of Contents

4.1 Introduction
4.2 Dataset
4.3 Operators Used in This Use Case
4.3.1 Read CSV Operator
4.3.2 Principal Component Analysis Operator
4.3.3 Split Data Operator
4.3.4 Performance (Classification) Operator
4.4 Data Import
4.5 Pre-processing
4.5.1 Principal Component Analysis
4.6 Model Training, Testing, and Performance Evaluation
4.6.1 Training the Model
4.6.2 Testing the Model
4.6.3 Performance Evaluation

Dataset: Please download the dataset from the following location:

Processes (Chapters 3-6): Click here to download