Chapter 22

Title: Instance Selection in RapidMiner

Summary

Chapter 22 introduces the RapidMiner Extension for Instance Selection and Prototypebased Rule (ISPR) induction. It describes the instance selection and prototype construction methods implemented in this extension and applies them to accelerate 1-NN classification on large datasets and to perform outlier elimination and noise reduction. The datasets analysed in this chapter include several medical datasets for classifying patients with respect to certain medical conditions, i.e., diabetes, heart diseases, and breast cancer, as well as an e-mail spam detection dataset. The chapter describes a variety of prototype selection algorithms including k- Nearest-Neighbors (k-NN), Monte-Carlo (MC) algorithm, Random Mutation Hill Climbing (RMHC) algorithm, Condensed Nearest-Neighbor (CNN), Edited Nearest-Neighbor (ENN), Repeated ENN (RENN), Gabriel Editing proximity graph-based algorithm (GE selection), Relative Neighbor Graph algorithm (RNG selection), Instance-Based Learning (IBL) algorithm (IB3 selection), Encoding Length Heuristic (ELH selection), and combinations of them and compares their performance on the datasets mentioned above.

Prototype construction methods include all algorithms that produce a set of instances at the output. The family contains all prototype-based clustering methods like k-Means, Fuzzy CMeans (FCM), and Vector Quantization (VQ) as well as the Learning Vector Quantization (LVQ) set of algorithms. The price for the speed-up of 1-NN by instance selection is visualized by the drop in predictive accuracy with decreasing sample size.

Table of Contents

22.1 Introduction
22.2 Instance Selection and Prototype-Based Rule Extension
22.3 Instance Selection
22.3.1 Description of the Implemented Algorithms
22.3.2 Accelerating 1-NN Classification
22.3.3 Outlier Elimination and Noise Reduction
22.3.4 Advances in Instance Selection
22.4 Prototype Construction Methods
22.5 Mining Large Datasets
22.6 Summary
22.6 Bibliography

Data & Processes: Click here to download