Chapter 17

Title: Medical Data Mining


Chapter 17 provides an introduction to medical data mining, an overview of methods often used for classification, regression, clustering, and association rules generation in this domain, and two application use cases with data about patients suffering from carpal tunnel syndrome and diabetes, respectively.

In the study of the carpal tunnel syndrome (CTS), thermographic images of hands were collected for constructing a predictive classification model for CTS, which could be helpful when looking for a non-invasive diagnostic method. The temperatures of different areas of a patient’s hand were extracted from the image and saved in the dataset. Using a RapidMiner preprocessing operator for aggregation, the temperatures were averaged for all segments of the thermal images. Different machine learning algorithms including Artificial Neural Network and Support Vector Machines (SVM) were evaluated for generating a classification model capable of diagnosing CTS on the basis of very discrete temperature differences that are invisible to the human eye in a thermographic image.

In the study of diabetes, various research questions were posed to evaluate the level of knowledge and overall perceptions of diabetes mellitus type 2 (DM) within the older population in North-East Slovenia. As a chronic disease, diabetes represents a substantial burden for the patient. In order to accomplish good self-care, patients need to be qualified and able to accept decisions about managing the disease on a daily basis. Therefore, a high level of knowledge about the disease is necessary for the patient to act as a partner in managing the disease. Various research questions were posed to determine what the general knowledge about diabetes is among diabetic patients 65 years and older, and what the difference in knowledge about diabetes is with regard to the education and place of living on (1) diet, (2) HbA1c, (3) hypoglycemia management, (4) activity, (5) effect of illness and infection on blood sugar levels, and (6) foot care. A hypothesis about the level of general knowledge of diabetes in older populations living in urban and rural areas was predicted and verified through the study. A cross-sectional study of older (age >65 years), non-insulin dependent patients with diabetes mellitus type 2 who visited a family physician, DM outpatient clinic, a private specialist practice, or were living in a nursing home was implemented. The Slovenian version of the Michigan Diabetes Knowledge test was then used for data collection. In the data preprocessing, missing values in the data were replaced, before k-means clustering was used to find groups of similar patients, for which then a decision tree learner was used to find attributes discriminating the clusters and generate a classification model for the clusters. A grouped ANOVA (ANalysis Of VAriances) statistical test verified the hypothesis that there are differences in the level of knowledge about diabetes in rural populations and city populations in the age group of 65 years and older.

Table of Contents

17.1 Background
17.2 Description of Problem Domain: Two Medical Examples
17.2.1 Carpal Tunnel Syndrome
17.2.2 Diabetes
17.3 Data Mining Algorithms in Medicine
17.3.1 Predictive Data Mining
Classification and Regression
17.3.2 Descriptive Data Mining
Association Rules
17.3.3 Data Mining and Statistics: Hypothesis Testing
17.4 Knowledge Discovery Process in RapidMiner: Carpal Tunnel Syndrome
17.4.1 Defining the Problem, Setting the Goals
17.4.2 Dataset Representation
17.4.3 Data Preparation
FilterExample Operator
RemoveUnusedValues Operator
SelectAttributes Operator
SetRole Operator
17.4.4 Modeling
17.4.5 Selecting Appropriate Methods for Classification
XValidation Operator
Apply Model Operator
Neural Network Operator
Performance Operator
17.4.6 Results and Data Visualisation
17.4.7 Interpretation of the Results
17.4.8 Hypothesis Testing and Statistical Analysis
Rename Operator
Generate Attributes Operator
Append Operator
NumericalToBinominal Operator
Aggregate Operator
ANOVA Operator
17.4.9 Results and Visualisation
17.5 Knowledge Discovery Process in RapidMiner: Diabetes
17.5.1 Problem Definition, Setting the Goals
17.5.2 Data Preparation
Replace Missing Values Operator
17.5.3 Modeling
K-means Operator
Decision Tree Operator
17.5.4 Results and Data Visualization
17.5.5 Hypothesis Testing
17.6 Specifics in Medical Data Mining
17.7 Summary
17.7 Bibliography

Data & Processes: Click here to download