Title: Instance Selection in RapidMiner
Chapter 22 introduces the RapidMiner Extension for Instance Selection and Prototypebased Rule (ISPR) induction. It describes the instance selection and prototype construction methods implemented in this extension and applies them to accelerate 1-NN classification on large datasets and to perform outlier elimination and noise reduction. The datasets analysed in this chapter include several medical datasets for classifying patients with respect to certain medical conditions, i.e., diabetes, heart diseases, and breast cancer, as well as an e-mail spam detection dataset. The chapter describes a variety of prototype selection algorithms including k- Nearest-Neighbors (k-NN), Monte-Carlo (MC) algorithm, Random Mutation Hill Climbing (RMHC) algorithm, Condensed Nearest-Neighbor (CNN), Edited Nearest-Neighbor (ENN), Repeated ENN (RENN), Gabriel Editing proximity graph-based algorithm (GE selection), Relative Neighbor Graph algorithm (RNG selection), Instance-Based Learning (IBL) algorithm (IB3 selection), Encoding Length Heuristic (ELH selection), and combinations of them and compares their performance on the datasets mentioned above.
Prototype construction methods include all algorithms that produce a set of instances at the output. The family contains all prototype-based clustering methods like k-Means, Fuzzy CMeans (FCM), and Vector Quantization (VQ) as well as the Learning Vector Quantization (LVQ) set of algorithms. The price for the speed-up of 1-NN by instance selection is visualized by the drop in predictive accuracy with decreasing sample size.
Table of Contents
22.2 Instance Selection and Prototype-Based Rule Extension
22.3 Instance Selection
22.3.1 Description of the Implemented Algorithms
22.3.2 Accelerating 1-NN Classification
22.3.3 Outlier Elimination and Noise Reduction
22.3.4 Advances in Instance Selection
22.4 Prototype Construction Methods
22.5 Mining Large Datasets
Data & Processes: Click here to download