Chapter 18

Title: Using PaDEL to Calculate Molecular Properties and Chemoinformatic Models


Chapter 18 covers a use case relevant in chemistry and the pharmaceutical industry. The RapidMiner Extension PaDEL (Pharmaceutical Data Exploration Laboratory) developed at the University of Singapore is deployed to calculate a variety of molecular properties from the 2-D or 3-D molecular structures of chemical compounds. Based on these molecular property vectors, RapidMiner can then generate predictive models for predicting chemical, biochemical, or biological properties based on molecular properties, which is a frequently encountered task in theoretical chemistry and the pharmaceutical industry. The combination of RapidMiner and PaDEL provides an open source solution to generate prediction systems for a broad range of biological properties and effects. One application example in drug design is the prediction of effects and side effects of a new drug candidate before even producing it, which can help to avoid testing many drug candidates that probably are not helpful or possibly even harmful and thereby help to focus research resources on more promising drug candidates. With PaDEL and RapidMiner, properties can be calculated for any molecular structure, even if the compound is not physically accessible. Since both tools are open source and can compute the properties of a molecular structure quickly, this allows significant reduction in cost and an increase in speed of the development of new chemical compounds and drugs with desired properties, because more candidate molecules can be considered automatically and fewer of them need to be actually generated and physically, chemically, or biologically tested. The combination of data mining (RapidMiner) and a tool to handle molecules (PaDEL) provides a convenient and user-friendly way to generate accurate relationships between chemical structures and any property that is supposed to be predicted, mostly biological activities. Relationships can be formulated as qualitative structure-property relationships (SPRs), qualitative structure-activity relationships (SARs) or quantitative structure activity relationships (QSARs). SPR models aim to highlight associations between molecular structures and a target property, such as lipophilicity. SAR models correlate an activity with structural properties and QSAR models quantitatively predict an activity. Models are typically developed to predict properties that are difficult to obtain, impossible to measure, require time-consuming experiments, or are based on a variety of other complex properties. They may also be useful to predict complicated properties using several simple properties. The PaDEL extension enables RapidMiner to directly read and handle molecular structures, calculate their molecular properties, and to then correlate them to and generate predictive models for chemical, biochemical, or biological properties of these molecular structures. In this chapter linear regression is used as a QSAR modeling technique to predict chemical properties with RapidMiner based on molecular properties computed by PaDEL.

Table of Contents

18.1 Introduction
18.2 Molecular Structure Formats for Chemoinformatics
18.3 Installation of the PaDEL Extension for RapidMiner
18.4 Applications and Capabilities of the PaDEL Extension
18.5 Examples of Computer-aided Predictions
18.6 Calculation of Molecular Properties
18.7 Generation of a Linear Regression Model
18.8 Example Workflow
18.9 Summary
18.9 Acknowledgment
18.9 Bibliography

Dataset & Processes: Click here to download