TY - JOUR
T1 - QSAR models for tyrosinase inhibitory activity description applying modern statistical classification techniques
T2 - A comparative study
AU - Le-Thi-Thu, Huong
AU - Cardoso, Gladys Casas
AU - Casañola-Martin, Gerardo M.
AU - Marrero-Ponce, Yovani
AU - Puris, Amilkar
AU - Torrens, Francisco
AU - Rescigno, Antonio
AU - Abad, Concepción
N1 - Funding Information:
M-P. Y and C-M. G. M. thank the program Estades Temporals per a Investigadors Convidats for a fellowship to work at Valencia University (2010). C-M. G.M. also thanks Professor Cosme Santiesteban-Toca and Departamento de Bioinformática y Automatización de Procesos Biológicos, Centro de Bioplantas for partial support in this paper. F. T. acknowledges financial support from the Spanish MEC DGI (Project No. CTQ2004-07768-C02-01/BQU ) and Generalitat Valenciana ( DGEUI INF01-051 and INFRA03-047 , and OCYT GRUPOS03-173 ). The authors acknowledge also the partial financial support from the Spanish Ministry of Science and Innovation (Project reference: SAF2009-10399 ). Finally, but not least, this work was supported in part by VLIR (Vlaamse InterUniversitaire Raad, Flemish Interuniversity Council, Belgium) under the IUC Program VLIR-UCLV.
PY - 2010/12/15
Y1 - 2010/12/15
N2 - Cluster analysis (CA), Linear and Quadratic Discriminant Analysis (L(Q)DA), Binary Logistic Regression (BLR) and Classification Tree (CT) are applied on two datasets for description of tyrosinase inhibitory activity from molecular structures. The first set included 701 tyrosinase inhibitors (TI) that are used for performance of inhibitory and non-inhibitory activity and the second one is for potency estimation of active compounds. 2D TOMOCOMD-CARDD atom-based quadratic indices are computed as molecular descriptors. CA is used to "rational" design of training (TS) and prediction set (PS) but it shows of not being adequate as classification technique. On the first data, the overall accuracies (Q) are 91.42%, 92.35% 91.88%, 91.79% for TS, and 91.04%, 92.43%, 88.24%, 89.36% for PS in LDA, QDA BLR and CT-based model, respectively, while the corresponding values obtained on the second one are 89.95%, 90.70%, 90.20%, 89.20% for TS and 83.71%, 84.44%, 82.96%, 82.22% for PS. A comparative analysis of used statistical techniques is held out taking into consideration generated posterior probability, accuracy, required assumptions and the form of predictor variables used. On the two datasets, results depicted by Receiver Operating Characteristic (ROC) curves together with Multiple Comparison Procedures (MCP) show that QDA has in general the best behavior as classification algorithm. The results suggest that it will be possible to produce a better description of tyrosinase activity applying the statistical techniques presented in this report, which could increase the practicality of the in silico data mining for the discovery of novel TIs.
AB - Cluster analysis (CA), Linear and Quadratic Discriminant Analysis (L(Q)DA), Binary Logistic Regression (BLR) and Classification Tree (CT) are applied on two datasets for description of tyrosinase inhibitory activity from molecular structures. The first set included 701 tyrosinase inhibitors (TI) that are used for performance of inhibitory and non-inhibitory activity and the second one is for potency estimation of active compounds. 2D TOMOCOMD-CARDD atom-based quadratic indices are computed as molecular descriptors. CA is used to "rational" design of training (TS) and prediction set (PS) but it shows of not being adequate as classification technique. On the first data, the overall accuracies (Q) are 91.42%, 92.35% 91.88%, 91.79% for TS, and 91.04%, 92.43%, 88.24%, 89.36% for PS in LDA, QDA BLR and CT-based model, respectively, while the corresponding values obtained on the second one are 89.95%, 90.70%, 90.20%, 89.20% for TS and 83.71%, 84.44%, 82.96%, 82.22% for PS. A comparative analysis of used statistical techniques is held out taking into consideration generated posterior probability, accuracy, required assumptions and the form of predictor variables used. On the two datasets, results depicted by Receiver Operating Characteristic (ROC) curves together with Multiple Comparison Procedures (MCP) show that QDA has in general the best behavior as classification algorithm. The results suggest that it will be possible to produce a better description of tyrosinase activity applying the statistical techniques presented in this report, which could increase the practicality of the in silico data mining for the discovery of novel TIs.
KW - Atom-based quadratic indices
KW - Modern statistical methods
KW - Multiple Comparison Procedures
KW - ROC curve
KW - TOMOCOMD-CARDD Software
KW - Tyrosinase inhibitor
UR - http://www.scopus.com/inward/record.url?scp=78650309418&partnerID=8YFLogxK
U2 - 10.1016/j.chemolab.2010.08.016
DO - 10.1016/j.chemolab.2010.08.016
M3 - Artículo
AN - SCOPUS:78650309418
SN - 0169-7439
VL - 104
SP - 249
EP - 259
JO - Chemometrics and Intelligent Laboratory Systems
JF - Chemometrics and Intelligent Laboratory Systems
IS - 2
ER -