Chapter 15

Title: Text Mining with RapidMiner

Summary

Chapter 15 analyses hotel review texts and ratings by customers collected from the TripAdvisor web page. Frequently co-occurring words in the review texts are found using FP-Growth and association rule generation and visualized in a word-association graph. In a second analysis, the review texts are clustered with k-Means, which reveals groups of similar texts. Both approaches provide insights about the hotels and their customers, i.e., about topics of interest and of complaints, quality and service issues, likes, dislikes, and preferences, and could similarly be applied to all kinds of textual reviews and customer feedback.

Table of Contents

15.1 Introduction
15.1.1 Text Mining
15.1.2 Data Description
15.1.3 Running RapidMiner
15.1.4 RapidMiner Text Processing Extension Package
15.1.5 Installing Text Mining Extensions
15.2 Association Mining of Text Document Collection (Process01)
15.2.1 Importing Process01
15.2.2 Operators in Process01
15.2.3 Saving Process01
15.3 Clustering Text Documents (Process02)
15.3.1 Importing Process02
15.3.2 Operators in Process02
15.3.3 Saving Process02
15.4 Running Process01 and Analyzing the Results
15.4.1 Running Process01
15.4.2 Empty Results for Process01
15.4.3 Specifying the Source Data for Process01
15.4.4 Re-Running Process01
15.4.5 Process01 Results
15.4.6 Saving Process01 Results
15.5 Running Process02 and Analyzing the Results
15.5.1 Running Process02
15.5.2 Specifying the Source Data for Process02
15.5.3 Process02 Results
15.6 Conclusions
15.6 Acknowledgment

Data & Processes: Click here to download