Subject Index

1-grams, 211

10-Fold Cross-Validation, 220

2-grams, 213

2D Structure Descriptors, 315

3-Grams, 213

3D Scatter Plot, 162, 166, 173

3D Structure Descriptors, 315

A-B-C Segments, 8

Accuracy, 94, 275

Adjusted Rand Index, 161

Advanced Analytics, 3

Affinity, 78

Affinity-Based Marketing, 77

Agglomerative Clustering, 159

Aggregation, 84

AML, 262

AML Data Import, 262

Analogy Reasoning, 6

Analysis of Variances (ANOVA), 294

Analytics, 3

Anomaly Detection, 395

ANOVA, 294

API, 207

Area Detection, 340

Area under the Curve, 92, 94, 96

Artificial Neural Network, 290, 291

Artificial Neural Network Learner, 316

ASCII, 211

Association Rule Mining, 97, 113, 114, 234, 235, 239

Association Rule Visualization, 116

Association Rules, 22, 100, 249, 284

Astronomy, 257

Astroparticle Physcis, 257

Attribute Role, 13, 14, 288

Attribute Roles, 41, 86, 321

Attribute Selection, 12, 67, 109, 150, 242,

261, 264, 322

Attribute Value Type, 40, 82, 92, 111

Attribute Value Type Transformation, 82

Attribute Value Types, 14, 321

Attribute Weighting, 22, 264, 322

Attributes, 11, 12, 14

AUC, 92, 94, 96

Audio Recommender System, 121

Automated Text Classification, 194

Backward Elimination, 323

Bag-of-Words Model, 215

Bag-of-Words Representation, 213

Balanced Training Set, 89

Banking Industry, 77

Bayesian Personalized Ranking Matrix

Factorization, 122

Beam Search, 325

Behavior, 84

Big Data, 4

Bigrams, 213

Binary Classification, 78, 91, 95, 275

Binomial Attribute, 82

Binomial Classification, 91

Binominal Classification, 275

Biological Activities, 313, 314

Biological Property, 311

Block Plot, 173

Boolean Attribute, 82

Bootstrap Validation, 326

Bootstrapping, 22, 271, 326

Business Understanding, 78

Carcinogenicity Prediction, 314

Carpal Tunnel Syndrome (CTS), 281, 285

CART, 150

Causal Relations, 6

Centroid, 159, 160

Changing Attribute Roles, 41, 288, 321

Changing the Attribute Value Type, 40

Channel Selection, 8

Character N-Grams, 210, 223

Characteristics, 11

Chemical Properties, 314

Chemical Structures, 313

Chemistry, 319

Chemoinformatic Model, 314

Chemoinformatic Models, 311

Chemoinformatic Prediction Model, 311, 315

Chemoinformatics, 311, 319

Churn Prediction, 7, 9, 94

Churn Prevention, 7, 9, 94

Class Imbalance, 88

Classification, 8, 22, 25, 33, 45, 53, 78, 145, 149, 208, 211, 214, 229, 272, 283, 288, 314, 350

Classification Accuracy, 275

Classification and Regression Tree (CART), 150

Classification of Images, 340, 344

Classification of Text, 207

Cluster Centroids, 250

Cluster Density, 160

Cluster Internal Validation, 182

Cluster Model, 250

Cluster Validity Measures, 159

Cluster Visualization, 166, 250

Clustering, 8, 22, 157, 158, 181, 234, 240, 242, 250, 284

Clustering Validity Measures, 157

Coincidence, 4

Collaborative Filtering, 141

Collaborative Filtering Recommender System, 120

Collaborative Recommender System, 127, 130

Comma Separated Values File, 101

Concept, 13

Confidence, 90, 275

Confidence Threshold, 90

Confusion Matrix, 95

Construction, 10

Content Filtering, 208

Content-Based Recommendation, 141

Content-Based Recommender System, 120, 132

Contingency Table, 95

Contingency Table, 89

Conversion of Images, 338

Convert Images, 335

Correlation, 6, 22, 127, 322

Cosine Correlation, 127

Cosine Similarity, 135, 215

Cost-Based Performance Evaluation, 293

Covering Algorithm, 364

Credit Default Prediction, 8, 53

Credit Risk Scoring, 53

Credit Scoring, 8

CRISP-DM, 78

Cross-Distance, 135

Cross-Industry Standard for Data Mining, 78

Cross-Marketing, 284

Cross-Selling, 9, 284

Cross-Validation, 22, 87, 95, 151, 202, 220, 263, 275, 290, 326

CSV File, 101

CSV File Import, 46, 58, 67, 101, 166, 217, 263, 315

CTS, 281, 285

Customer Behavior, 84, 97

Customer Churn Prevention, 94

Customer Insight, 8

Customer Lifetime Value, 9

Customer Loyalty, 9, 98

Customer Profile, 12

Customer Relationship, 86

Customer Retention, 9

Customer Segmentation, 8

Customer Service Process Automation, 9

Data, 15, 16

Data Cleaning, 86, 95

Data Cleansing, 111

Data Export, 22

Data Import, 21, 39, 46, 48, 58, 67, 82,

101, 166, 195, 217, 320

Data Import Wizard, 195

Data Loading Wizzard, 263

Data Mining, 4

Data Preparation, 81, 95, 286

Data Preprocessing, 286, 321

Data Transformation, 22, 111

Data Type, 92

Data Types, 14, 82, 92

Data Understanding, 79

Data Warehouse, 79

Database, 21{23

Database Import, 105

Dataset, 16

Davies Bouldien, 160

DBSCAN, 182

Document Frequency, 208

Decay Parameter, 292

Decision Support, 284

Decision Tree, 25, 27, 150, 272, 289, 316, 323, 341, 345

Decision Tree Induction, 25, 27

Decision Trees, 92

Demand Forecasting, 10

Deployment, 93, 95, 203

Detecting Text Message Spam, 193

Diabetes, 281, 282

Dimensionality Reduction, 339

Direct Mail, 77

Direct Mailing, 284

Direct Marketing, 8, 284

Direct Marketing Campaign Optimization, 8, 77

Discretization, 61, 87, 145

Distance Measure, 127

Distance-Based Decision Tree, 364

Document Frequency, 236

Document Representation, 211

Document Vector, 222, 236

Document Vector Model, 213

Download, 19, 35, 45, 48, 58, 65, 126, 136, 138, 140, 141, 164, 167, 195, 216, 217, 234, 261, 262, 312, 334

Drug Design, 319

Drug Effect Prediction, 320

Dummy Coding, 93

E-Coli Data, 159, 161, 163, 167, 176

E-Commerce, 7

Edge Detection, 340, 350

Edge Enhancement, 340

Educational Data Mining, 143, 145, 181

Effect Coding, 93

Electronics, 10

Encoding, 211

Ensemble Classifier, 150

Ensemble of Classifiers, 272

Entropy, 28

Error Prediction, 8

ETL, 207, 228

Euclidean Distance, 215

Evaluating Feature Selection Algorithms, 264

Evaluating Feature Selection Stability, 267

Evaluating Feature Weighting Algorithms, 264

Evaluation, 22, 37, 48, 50, 61, 69, 87, 146, 148, 150, 202, 292, 325

Example, 14, 95

Example Selection, 363

Example Set, 15, 249, 347

Example Weights, 88

Examples, 13, 15

Excel File Import, 23, 67, 315

Export, 22

Export Images, 335

Extensions, 235, 334

Fact Table, 80

Factorization-Based Recommender System, 128

Failure Prediction, 8, 10

False Negatives, 95

False Positive Rate, 95

False Positives, 95, 275

Feature Extraction, 234, 334, 338, 339, 349

Feature Selection, 150, 261, 264, 322

Features, 11

Feature Selection Stability Validation,     267

Feed-Forward Backpropagation Neural

Network, 291

Filter Examples, 271

Filtering Examples, 287

Finance Sector, 194

Financial Services, 7

Forward Selection, 264, 323

Fowlkes-Mallow Index, 161

FP-Growth, 113, 114, 239

Fraud Detection, 7

Frequency Distribution of Words, 250

Frequent Item Set, 113

Frequent Item Set Mining, 9, 239

Gaussian Blur, 335, 350

Gaussian Mixture Clusters, 162, 166

Generate Attributes, 269, 275

Generating Attributes, 82, 321

Glas Identification, 45

Global-Level Feature Extraction, 334, 339, 340, 344

Global-Level Features, 347

Graphical User Interface, 19, 235

GUI, 19, 235

Handling Missing Values, 322

Health Care Sector, 280

Hierarchical Clustering, 158

Hotel Review Analysis, 234

HSV, 333

HTML, 229

HTTP, 208, 228

Human Resources, 194

Hybrid Recommender System, 120, 135, 141

Hypothesis Test, 284, 294

ID Attribute, 14

Image Classification, 340, 344, 350

Image Combinations, 339

Image Conversion, 335, 338, 339

Image Data, 334, 339

Image Database, 335

Image Export, 335

Image Feature Extraction, 334, 338, 340, 349

Image Import, 335

Image Mining, 281, 333, 347, 349

Image Mining Extension for RapidMiner, 333, 334

Image Segmentation, 340, 341

Image Transformation, 339

Image Transformations, 339

IMMI Extension, 333, 334

Import, 21, 23, 217

Import CSV Files, 195, 197

Import Data, 82, 166

Import Data from Database, 105

Import Images, 335

Indicator Attributes, 85

Indicators, 11

Influence Factors, 6, 11

Information Gain, 28, 322

Installation, 19, 139, 164, 194, 234, 235, 261, 312, 334

Instance Selection, 363

Integration, 208

Intrusion Detection, 395

Item Recommendation, 122

Item Sets, 239

Iterating over a Set of Files, 337

Iterating over a Set of Images, 335, 336

Iteration, 148

Jaccard Index, 161, 268

Java Database Connectivity, 101

JDBC, 101

Join, 82, 135

k-Means, 158, 159, 167, 182

k-means Clustering, 242

k-Medoids, 158, 182

k-Nearest Neighbor, 33, 45, 131

k-Nearest Neighbors, 122, 137, 289, 364

k-Nearest Neighbours, 208, 214, 226

k-NN, 33, 45, 122, 131, 137, 208, 214, 226, 289, 364

Kennard-Stone Sampling, 271

Knowledge Discovery from Textual Databases, 234

Kuncheva Index, 268

Label, 11, 13, 14, 86

Label Type Conversion, 321

Labeling, 337

Language Identification, 207, 209

Latent Features, 129

Learning Algorithm, 334

Learning Rate, 292

Leave-One-Out Validation, 326

Lift Chart, 90

Linear Regression, 93, 283, 289, 315, 317

Local Level Feature Extraction, 340

Local Maxima, 292

Local Minima, 292

Local Outlier Factor, 395

Local-Level Feature Extraction, 334, 339

Local-Level Features, 347

LOF, 395

Logging, 432

Logistic Model Tree, 150

Logistic Regression, 93, 323

Logistics, 8

Loop, 148, 263, 266

Loop Files, 217

Loop over Attributes, 322

Loop Parameters, 263, 266

Loyalty Cards, 99

M5 Prime, 317

Machine Failure Prediction, 10

Machine Failure Prevention, 10

Machine Learning Algorithm, 334

Machine Learning Research, 425

Machine Translation, 208

Macro Variables, 148, 336

Manufacturing Process Optimization, 10

Manufacturing, 10

Market Basket Analysis, 9, 97, 284

Marketing, 12

Marketing Campaign Optimization, 77

Markov Models, 214

Martketing, 8

Matrix Factorization, 122, 127, 128, 141

Maximum Relevance Minimum Redundancy

Feature Selection, 261, 264

Media, 9

Medical Data Mining, 280

Meta Data, 15, 16

Meta-Learning, 425

Meta-learning, 436

Missing Value Handling, 86, 111

Model, 16

Model Application, 203, 291

Model Updates, 131

Modeling, 16, 22, 25, 87, 288, 323

Molecular Descriptors, 311

Molecular Properties, 311, 320

Molecular Structure Formats, 311

Molecular Structures, 313

Momentum, 292

Monte Carlo Simulation, 269

Movie Recommender System, 121

MRMR Feature Selection, 261, 264, 265

Multi-Layer Neural Network, 291

Multi-Layer Perceptron, 291

Multiple Linear Regression Model, 315, 317

Music Recommender System, 121

MySQL, 106

N-Grams, 210, 213, 223, 240, 250

Naive Bayes, 149, 201, 214, 222, 289

Naive Bayes, 53, 65

Natural Language Processing, 208, 228

Nearest Neighbor, 33, 45

Nearest Neighbours, 208, 214, 226

Negative Example, 14

Neighborhood-Based Recommender System, 127

Network Analysis, 9

Neural Network, 92, 289{291

Neural Network Learner, 316

Neural Networks, 334

Neutrino Astronomy, 257

News Categorization, 194

News Filtering, 194

Next Best Action, 8

NLP, 208, 228

Normalization, 292

Nursery Data, 65

Object Detection, 335, 340

OLAP, 3

Online Analytical Processing, 3

Open Color Image, 336

Open Grayscale Image, 335

Operational Model, 16

Operator, 21

Opinion Mining, 8, 228

Optimization, 150

Optimize Parameters, 266

Optimizing Feature Selection and Machine

Learning, 264

Optimizing Throughput Rates, 10

Outlier Detection, 7

Outlier Factor, 395

Over-Fitting, 150, 317, 322

PaDEL, 311, 320

PaDEL Extension for RapidMiner, 312

Parallelization, 436

Parameter Loop, 263, 266

Parameter Optimization, 93, 266

Partitional Clustering, 158

Patent Text Analysis, 10

PCA, 47, 50

Pearson Correlation, 127

Performance Evaluation, 38, 48, 50, 61, 69, 89, 265, 275, 292

Performance Measures, 124

Performance Metrics, 89, 95, 265

Permutation, 326

Personalized Recommender System, 120

Perspective, 19

Pharmaceutical Data Exploration Laboratory, 311

Pharmaceutical Industry, 280, 319

Physics, 257

Plotters, 173

PMML Extension for RapidMiner, 235

Point of Interest Detection, 340

Porter Stemmer, 135, 237

Ports, 22

Positive Example, 14

Precision, 89, 95, 125

Prediction, 275

Prediction of Carcinogenicity, 314

Predictive Accuracy, 275

Predictive Analytics, 8, 9

Predictive Maintenance, 10

Predictive Model, 25

Preventive Maintenance, 10

Price Prediction, 10

Principal Component Analysis, 47, 50

Probabilistic Classifier, 149

Process, 22, 23

Process Documents, 222, 223, 229, 235

Product Recommendation, 120

Product Recommendations, 9

Prototype Selection, 363

Prototype-Based Rules, 363

Pruning, 236

QSAR, 313

Quality Assurance, 10

Quality Optimization, 10

Quality Prediction, 10

Quantitative Structure-Activity Relationship, 313

R Console, 163

R Extension for RapidMiner, 235

R Packages, 164

R Script, 164, 169

Radial-Basis Function Kernel, 343

Rand Index, 161

Random Forest, 150, 265, 272, 323

Random Forest Learner, 316

Random Forests, 341

Ranking, 78, 121, 135, 215, 284

RapidAnalytics, 135, 138, 208, 228

RapidMiner, 19

RapidMiner Feature Selection Extension,  261

RapidMiner Image Mining Extension, 349

RapidMiner IMMI Extension, 349

RapidMiner Instance Selection and Prototype-

Based Rules Extension, 363

RapidMiner ISPR Extension, 363

RapidMiner PaDEL Extension, 320

RapidMiner R Extension, 163

RapidMiner Recommender Extension, 121

RapidMiner Text Processing Extension, 194

RapidMiner Weka Extension, 150, 273

RapidMiner WhiBo Extension, 182

Rating, 121, 122

RBF, 343

Re-Balancing, 88

Reasoning by Analogy, 6

Recall, 89, 95

Receiver Operating Characteristics, 95

Recommender Performance Evaluation, 121

Recommender Performance Measures, 124

Recommender System, 119, 121, 141, 143

Recommender System Web Service, 138

Recommender Systems, 9

Redundancy, 322

Redundant Attributes, 87

Regression, 22, 283, 315, 317

Regular Attribute, 14

Regular Expression, 212

Regular expressions, 430

Relative Validity Measures, 161

Relief, 322

Removing Useless Attributes, 323

Renaming Attributes, 321

Reporting Extension for RapidMiner, 235

Repository, 21, 239, 242, 249

Reputation Monitoring, 194

Retail, 7, 8, 97

RGB, 333, 347

Risk Analysis, 8

Risk Management, 8

ROC, 95

ROC Chart, 90, 91, 94

ROI Statistics, 343

Roles, 86

Rule-Based Model, 214

Running a Process, 242

Sales, 8, 12

Sampling, 88, 109, 271

SAR, 313

Saving a Process, 242

Saving Process Results, 249

Script, 438

SDF, 315

Segment-Level Feature Extraction, 334,

339, 340

Segment-Level Features, 347

Segmentation, 340

Select Attributes, 109, 145, 287

Select Examples, 271

Selecting Attributes, 82

Selecting Columns, 82

Selecting Examples, 287

Selecting Machine Learning Algorithms, 289, 425

Sensor Data, 10

Sentence Tokenization, 208, 212

Sentiment Analysis, 8, 208

Series Plot, 173, 176

Similarity, 22

Similarity Measure, 127, 335

Similarity Score, 135

Similarity-Based Content Recommendation, 134

Similarity-Based Model, 214, 226

Singular Value Decomposition, 137

SMILES, 315, 316

SMS, 193, 195, 197

Social Media Analysis, 208, 215

Spam Detection, 193, 195, 364

Sparse Data Format, 122, 124

SPR, 313

SQL, 106

SQL Database, 140

Star Schema, 80

Statistical Analysis, 294

Statistical analysis, 433

Stemming, 135, 208, 237

Stopword Filter, 237

Stopword Removal, 135, 137

Stratification, 88

Stratified Sampling, 271

Structure-Activity Relationship, 313

Structure-Property Relationship, 313

Subprocess, 21, 145, 275

Sum of Squares Item Distribution, 160

Supply Chain Management, 8, 10

Support Vector Clustering, 182

Support Vector Machine, 208, 214, 289, 316, 334, 341, 343

SVM, 208, 214, 289, 316, 334, 341, 343

t-test, 434

Target Attribute, 11, 13, 14

Target Property, 313

Target Variable, 11, 13

Teacher Assistant Evaluation Data, 35

Telecommunications, 7, 9

Term Frequency, 208, 236

Term N-Grams, 240, 250

Text Analysis, 10

Text Categorization, 207, 234

Text Classification, 193, 194, 199, 207, 234

Text Clustering, 234, 240, 242, 250

Text Data, 207, 233

Text Document Filtering, 194

Text Message Spam, 193

Text Mining, 10, 135, 193, 197, 207, 233, 234

Text Processing, 135, 200

Text Processing Extension for Rapid-

Miner, 235

Text Representation, 211, 213

TF-IDF, 208, 214, 236

TFIDF Word Vector Representation, 135

Thermography, 281

Threshold, 90, 94

Time Series Analysis, 8

Time Series Forecasting, 8

Token, 212

Token Filter, 237

Token Length Filter, 137

Tokenization, 208, 212, 222, 237

Tokenization of Text Documents, 137

Tokenizing Text Documents, 135, 197

Trading Analytics, 8

Training, 96

Training Cycles, 291

Transform Cases, 222, 237

Transport, 8

Trend Analysis, 8, 10, 234

True Negative, 96

True Positive, 96

True Positive Rate, 96

True Positives, 275

Type Conversion, 93, 111, 321

Unicode, 208, 211

Unigrams, 211

Unstructured Data, 207, 233, 334, 339

Unsupervised Learning, 158

Up-Selling, 9

Update, 235

Updates, 234

URL, 229

User-Item Matrix, 141

UTF-8, 208, 211

Utility Matrix, 122

Validation, 22, 37, 48, 87, 146, 148, 150,

202, 220, 292, 318, 325

Value Type, 82, 92, 111

Value Type Conversion, 321

Value Type Transformation, 82, 111

Value Types, 14

Variables, 11, 336

Video Recommender System, 121, 126, 134

View, 19

Virtual Drug Screening, 319

Visualization, 116, 157, 173

Wallace Indices, 161

Web Mining, 208

Web Mining Extension for RapidMiner, 208, 229

Web Page Language Identification, 228

Web Services, 138, 208, 228

Weight Attribute, 14

Weighted Regularized Matrix Factorization, 122

Weka, 273

Weka Extension for RapidMiner, 235

Word Frequency, 197

Word List, 135, 197, 198, 229, 249, 250

Word Stemming, 208, 237

Word Vector, 135, 197, 198, 222

Wrapper Validation, 150

Wrapper X-Validation, 263{265

X-Validation, 22, 87, 95, 151, 202, 220, 263, 275, 290, 326

XML, 208, 235

Y-Randomization, 326