Exploring the QSAR’s predictive truthfulness of the novel N-tuple discrete derivative indices on benchmark datasets

O. Martínez-Santiago, Y. Marrero-Ponce, R. Vivas-Reyes, O. M. Rivera-Borroto, E. Hurtado, M. A. Treto-Suarez, Y. Ramos, F. Vergara-Murillo, M. E. Orozco-Ugarriza, Y. Martínez-López

Producción científica: Contribución a una revistaArtículorevisión exhaustiva

7 Citas (Scopus)


Graph derivative indices (GDIs) have recently been defined over N-atoms (N = 2, 3 and 4) simultaneously, which are based on the concept of derivatives in discrete mathematics (finite difference), metaphorical to the derivative concept in classical mathematical analysis. These molecular descriptors (MDs) codify topo-chemical and topo-structural information based on the concept of the derivative of a molecular graph with respect to a given event (S) over duplex, triplex and quadruplex relations of atoms (vertices). These GDIs have been successfully applied in the description of physicochemical properties like reactivity, solubility and chemical shift, among others, and in several comparative quantitative structure activity/property relationship (QSAR/QSPR) studies. Although satisfactory results have been obtained in previous modelling studies with the aforementioned indices, it is necessary to develop new, more rigorous analysis to assess the true predictive performance of the novel structure codification. So, in the present paper, an assessment and statistical validation of the performance of these novel approaches in QSAR studies are executed, as well as a comparison with those of other QSAR procedures reported in the literature. To achieve the main aim of this research, QSARs were developed on eight chemical datasets widely used as benchmarks in the evaluation/validation of several QSAR methods and/or many different MDs (fundamentally 3D MDs). Three to seven variable QSAR models were built for each chemical dataset, according to the original dissection into training/test sets. The models were developed by using multiple linear regression (MLR) coupled with a genetic algorithm as the feature wrapper selection technique in the MobyDigs software. Each family of GDIs (for duplex, triplex and quadruplex) behaves similarly in all modelling, although there were some exceptions. However, when all families were used in combination, the results achieved were quantitatively higher than those reported by other authors in similar experiments. Comparisons with respect to external correlation coefficients (q2 ext) revealed that the models based on GDIs possess superior predictive ability in seven of the eight datasets analysed, outperforming methodologies based on similar or more complex techniques and confirming the good predictive power of the obtained models. For the q2 ext values, the non-parametric comparison revealed significantly different results to those reported so far, which demonstrated that the models based on DIVATI’s indices presented the best global performance and yielded significantly better predictions than the 12 0–3D QSAR procedures used in the comparison. Therefore, GDIs are suitable for structure codification of the molecules and constitute a good alternative to build QSARs for the prediction of physicochemical, biological and environmental endpoints.

Idioma originalInglés
Páginas (desde-hasta)367-389
Número de páginas23
PublicaciónSAR and QSAR in Environmental Research
EstadoPublicada - 4 may. 2017


Profundice en los temas de investigación de 'Exploring the QSAR’s predictive truthfulness of the novel N-tuple discrete derivative indices on benchmark datasets'. En conjunto forman una huella única.

Citar esto