Title: Application of RapidMiner in Neutrino Astronomy


Chapter 16 describes a data mining use case in astroparticle physics, the application of automated classification and automated feature selection in neutrino astronomy to separate a small number of neutrinos from a large number of background noise particles or signals (muons). One of the main scientific goals of neutrino telescopes is the detection of neutrinos originating from astrophysical sources as well as a precise measurement of the energy spectrum of neutrinos produced in cosmic ray air showers in the Earth’s atmosphere. These so-called atmospheric neutrinos, however, are hidden in a noisy background of atmospheric muons produced in air showers as well. The first task in rejecting this background is the selection of upward-going tracks since the Earth is opaque to muons but can be traversed by neutrinos up to very high energies. This procedure reduces the background by roughly three orders of magnitude. For a detailed analysis of atmospheric neutrinos, however, a very clean sample with purity larger than 95% is required. The main source of remaining background at this stage are muon tracks, falsely reconstructed as upward going. These falsely reconstructed muon tracks still dominate the signal by three orders of magnitude and have to be rejected by the use of straight cuts or multivariate methods. Due to the ratio of noise (muons) and signal (neutrinos), about 10,000 particles need to be recorded in order to catch about 10 neutrinos. Hence, the amount of data delivered by these experiments is enormous and it must be processed and analyzed within a proper amount of time. Moreover, data in these experiments are delivered in a format that contains more than 2000 attributes originating from various reconstruction algorithms. Most of these attributes have been reconstructed from only a few physical quantities. The direction of a neutrino event penetrating the detector at a certain angle can, for example, be reconstructed from a pattern of light that is initiated by particles produced by an interaction of the neutrino close to or even in the detector. Due to the fact that all of the 2000 reconstructed attributes are not equally well suited for classification, the first task in applying data mining techniques in neutrino astronomy lies in finding a good and reliable representation of the dataset in fewer dimensions. This is a task which very often determines the quality of the overall data analysis. The second task is the training and evaluation of a stable learning algorithm with a very high performance in order to separate signal and background events. Here, the challenge lies in the biased distribution of many more background noise (negative) examples than there are signals (positive) examples. Handling such skewed distributions is necessary in many real-world problems. The application of RapidMiner in neutrino astronomy models the separation of neutrinos from background as a two-step process, accordingly. In this chapter, the feature or attribute selection is explained in the first part and the training of selecting relevant events from the masses of incoming data is explained in the second part. For the feature selection, the Feature Selection Extension for RapidMiner is used and a wrapper cross-validation to evaluate the performance of the feature selection methods. For the selection of the relevant events, Random Forests are used as classification learner.

16.1 Protons, Photons, and Neutrinos
16.2 Neutrino Astronomy
16.3 Feature Selection
16.3.1 Installation of the Feature Selection Extension for RapidMiner
16.3.2 Feature Selection Setup
16.3.3 Inner Process of the Loop Parameters Operator
16.3.4 Inner Operators of the Wrapper X-Validation
16.3.5 Settings of the Loop Parameters Operator
16.3.6 Feature Selection Stability
16.4 Event Selection Using a Random Forest
16.4.1 The Training Setup
16.4.2 The Random Forest in Greater Detail
16.4.3 The Random Forest Settings
16.4.4 The Testing Setup
16.5 Summary and Outlook
