TY - JOUR
T1 - A comparative study of nonlinear machine learning for the "in silico" depiction of tyrosinase inhibitory activity from molecular structure
AU - Le-Thi-Thu, Huong
AU - Marrero-Ponce, Yovani
AU - Casañola-Martin, Gerardo M.
AU - Cardoso, Gladys Casas
AU - Chávez, Maria Del Carmen
AU - Garcia, María M.
AU - Morell, Carlos
AU - Torrens, Francisco
AU - Abad, Concepción
PY - 2011/6
Y1 - 2011/6
N2 - In the preset report, for the first time, support vector machine (SVM), artificial neural network (ANN), Bayesian networks (BNs), k-nearest neighbor (k-NN) are applied and compared on two "in-house" datasets to describe the tyrosinase inhibitory activity from the molecular structure. The data set Data I is used for the identification of tyrosinase inhibitors (TIs) including 701 active and 728 inactive compounds. Data II consists of active chemicals for potency estimation of TIs. The 2D TOMOCOMD-CARDD atom-based quadratic indices are used as molecular descriptors. The derived models show rather encouraging results with the areas under the Receiver Operating Characteristic (AURC) curve in the test set above 0.943 and 0.846 for the Data I and Data II, respectively. Multiple comparison tests are carried out to compare the performance of the models and reveal the improvement of machine learning (ML) techniques with respect to statistical ones (see Chemometr. Intell. Lab. Syst. 2010, 104, 249). In some cases, these ameliorations are statistically significant. The tests also demostrate that k-NN, despite being a rather simple approach, presents the best behavior in both data. The obtained results suggest that the ML-based models could help to improve the virtual screening procedures and the confluence of these different techniques can increase the practicality of data mining procedures of chemical databases for the discovery of novel TIs as possible depigmenting agents.
AB - In the preset report, for the first time, support vector machine (SVM), artificial neural network (ANN), Bayesian networks (BNs), k-nearest neighbor (k-NN) are applied and compared on two "in-house" datasets to describe the tyrosinase inhibitory activity from the molecular structure. The data set Data I is used for the identification of tyrosinase inhibitors (TIs) including 701 active and 728 inactive compounds. Data II consists of active chemicals for potency estimation of TIs. The 2D TOMOCOMD-CARDD atom-based quadratic indices are used as molecular descriptors. The derived models show rather encouraging results with the areas under the Receiver Operating Characteristic (AURC) curve in the test set above 0.943 and 0.846 for the Data I and Data II, respectively. Multiple comparison tests are carried out to compare the performance of the models and reveal the improvement of machine learning (ML) techniques with respect to statistical ones (see Chemometr. Intell. Lab. Syst. 2010, 104, 249). In some cases, these ameliorations are statistically significant. The tests also demostrate that k-NN, despite being a rather simple approach, presents the best behavior in both data. The obtained results suggest that the ML-based models could help to improve the virtual screening procedures and the confluence of these different techniques can increase the practicality of data mining procedures of chemical databases for the discovery of novel TIs as possible depigmenting agents.
KW - Atom-based quadratic index
KW - Machine learning technique
KW - Multiple comparison test
KW - Tyrosinase inhibitor
UR - http://www.scopus.com/inward/record.url?scp=79960574460&partnerID=8YFLogxK
U2 - 10.1002/minf.201100021
DO - 10.1002/minf.201100021
M3 - Artículo
AN - SCOPUS:79960574460
SN - 1868-1743
VL - 30
SP - 527
EP - 537
JO - Molecular Informatics
JF - Molecular Informatics
IS - 6-7
ER -