TY - JOUR
T1 - ElectroPredictor
T2 - An Application to Predict Mayr’s Electrophilicity E through Implementation of an Ensemble Model Based on Machine Learning Algorithms
AU - Cuesta, Sebastián A.
AU - Moreno, Martín
AU - López, Romina A.
AU - Mora, José R.
AU - Paz, José Luis
AU - Márquez, Edgar A.
N1 - Publisher Copyright:
© 2023 American Chemical Society.
PY - 2023/1/3
Y1 - 2023/1/3
N2 - Electrophilicity (E) is one of the most important parameters to understand the reactivity of an organic molecule. Although the theoretical electrophilicity index (ω) has been associated with E in a small homologous series, the use of w to predict E in a structurally heterogeneous set of compounds is not a trivial task. In this study, a robust ensemble model is created using Mayr’s database of reactivity parameters. A combination of topological and quantum mechanical descriptors and different machine learning algorithms are employed for the model’s development. The predictability of the model is assessed using different statistical parameters, and its validation is examined, including a training/test partition, an applicability domain, and a y-scrambling test. The global ensemble model presents a Q5-fold2 of 0.909 and a Qext2 of 0.912, demonstrating an excellent predictability performance of E values and showing that w is not a good descriptor for the prediction of E, especially for the case of neutral compounds. ElectroPredictor, a noncommercial Python application (https://github.com/mmoreno1/ElectroPredictor), is developed to predict E. QM9, a well-known large dataset containing 133885 neutral molecules, is used to perform a virtual screening (94.0% coverage). Finally, the 10 most electrophilic molecules are analyzed as possible new Mayr’s electrophiles, which have not yet been experimentally tested. This study confirms the necessity to build an ensemble model using nonlinear machine learning algorithms, topographic descriptors, and separating molecules into charged and neutral compounds to predict E with precision.
AB - Electrophilicity (E) is one of the most important parameters to understand the reactivity of an organic molecule. Although the theoretical electrophilicity index (ω) has been associated with E in a small homologous series, the use of w to predict E in a structurally heterogeneous set of compounds is not a trivial task. In this study, a robust ensemble model is created using Mayr’s database of reactivity parameters. A combination of topological and quantum mechanical descriptors and different machine learning algorithms are employed for the model’s development. The predictability of the model is assessed using different statistical parameters, and its validation is examined, including a training/test partition, an applicability domain, and a y-scrambling test. The global ensemble model presents a Q5-fold2 of 0.909 and a Qext2 of 0.912, demonstrating an excellent predictability performance of E values and showing that w is not a good descriptor for the prediction of E, especially for the case of neutral compounds. ElectroPredictor, a noncommercial Python application (https://github.com/mmoreno1/ElectroPredictor), is developed to predict E. QM9, a well-known large dataset containing 133885 neutral molecules, is used to perform a virtual screening (94.0% coverage). Finally, the 10 most electrophilic molecules are analyzed as possible new Mayr’s electrophiles, which have not yet been experimentally tested. This study confirms the necessity to build an ensemble model using nonlinear machine learning algorithms, topographic descriptors, and separating molecules into charged and neutral compounds to predict E with precision.
KW - Algorithms
KW - Machine Learning
KW - Databases, Factual
UR - http://www.scopus.com/inward/record.url?scp=85146002191&partnerID=8YFLogxK
U2 - 10.1021/acs.jcim.2c01367
DO - 10.1021/acs.jcim.2c01367
M3 - Artículo
C2 - 36594600
AN - SCOPUS:85146002191
SN - 1549-9596
VL - 63
SP - 507
EP - 521
JO - Journal of Chemical Information and Modeling
JF - Journal of Chemical Information and Modeling
IS - 2
ER -