Examining the predictive accuracy of the novel 3D N-linear algebraic molecular codifications on benchmark datasets

César R. García-Jacas, Ernesto Contreras-Torres, Yovani Marrero-Ponce, Mario Pupo-Meriño, Stephen J. Barigye, Lisset Cabrera-Leyva

Producción científica: Contribución a una revistaArtículorevisión exhaustiva

20 Citas (Scopus)


Background: Recently, novel 3D alignment-free molecular descriptors (also known as QuBiLS-MIDAS) based on two-linear, three-linear and four-linear algebraic forms have been introduced. These descriptors codify chemical information for relations between two, three and four atoms by using several (dis-)similarity metrics and multi-metrics. Several studies aimed at assessing the quality of these novel descriptors have been performed. However, a deeper analysis of their performance is necessary. Therefore, in the present manuscript an assessment and statistical validation of the performance of these novel descriptors in QSAR studies is performed. Results: To this end, eight molecular datasets (angiotensin converting enzyme, acetylcholinesterase inhibitors, benzodiazepine receptor, cyclooxygenase-2 inhibitors, dihydrofolate reductase inhibitors, glycogen phosphorylase b, thermolysin inhibitors, thrombin inhibitors) widely used as benchmarks in the evaluation of several procedures are utilized. Three to nine variable QSAR models based on Multiple Linear Regression are built for each chemical dataset according to the original division into training/test sets. Comparisons with respect to leave-one-out cross-validation correlation coefficients $$\left({Q-{loo}^{2} } \right)$$ Q l o o 2 reveal that the models based on QuBiLS-MIDAS indices possess superior predictive ability in 7 of the 8 datasets analyzed, outperforming methodologies based on similar or more complex techniques such as: Partial Least Square, Neural Networks, Support Vector Machine and others. On the other hand, superior external correlation coefficients $$\left({Q-{ext}^{2} } \right)$$ Q e x t 2 are attained in 6 of the 8 test sets considered, confirming the good predictive power of the obtained models. For the $$Q-{ext}^{2}$$ Q e x t 2 values non-parametric statistic tests were performed, which demonstrated that the models based on QuBiLS-MIDAS indices have the best global performance and yield significantly better predictions in 11 of the 12 QSAR procedures used in the comparison. Lastly, a study concerning to the performance of the indices according to several conformer generation methods was performed. This demonstrated that the quality of predictions of the QSAR models based on QuBiLS-MIDAS indices depend on 3D structure generation method considered, although in this preliminary study the results achieved do not present significant statistical differences among them. Conclusions: As conclusions it can be stated that the QuBiLS-MIDAS indices are suitable for extracting structural information of the molecules and thus, constitute a promissory alternative to build models that contribute to the prediction of pharmacokinetic, pharmacodynamics and toxicological properties on novel compounds.

Idioma originalInglés
Número de artículo122
PublicaciónJournal of Cheminformatics
EstadoPublicada - 25 feb. 2016


Profundice en los temas de investigación de 'Examining the predictive accuracy of the novel 3D N-linear algebraic molecular codifications on benchmark datasets'. En conjunto forman una huella única.

Citar esto