N-tuple topological/geometric cutoffs for 3D N-linear algebraic molecular codifications: variability, linear independence and QSAR analysis

C. R. García-Jacas, Y. Marrero-Ponce, S. J. Barigye, T. Hernández-Ortega, L. Cabrera-Leyva, A. Fernández-Castillo

Research output: Contribution to journalArticlepeer-review

11 Scopus citations


Novel N-tuple topological/geometric cutoffs to consider specific inter-atomic relations in the QuBiLS-MIDAS framework are introduced in this manuscript. These molecular cutoffs permit the taking into account of relations between more than two atoms by using (dis-)similarity multi-metrics and the concepts related with topological and Euclidean-geometric distances. To this end, the kth two-, three- and four-tuple topological and geometric neighbourhood quotient (NQ) total (or local-fragment) spatial-(dis)similarity matrices are defined, to represent 3D information corresponding to the relations between two, three and four atoms of the molecular structures that satisfy certain cutoff criteria. First, an analysis of a diverse chemical space for the most common values of topological/Euclidean-geometric distances, bond/dihedral angles, triangle/quadrilateral perimeters, triangle area and volume was performed in order to determine the intervals to take into account in the cutoff procedures. A variability analysis based on Shannon’s entropy reveals that better distribution patterns are attained with the descriptors based on the cutoffs proposed (QuBiLS-MIDAS NQ-MDs) with regard to the results obtained when all inter-atomic relations are considered (QuBiLS-MIDAS KA-MDs–‘Keep All’). A principal component analysis shows that the novel molecular cutoffs codify chemical information captured by the respective QuBiLS-MIDAS KA-MDs, as well as information not captured by the latter. Lastly, a QSAR study to obtain deeper knowledge of the contribution of the proposed methods was carried out, using four molecular datasets (steroids (STER), angiotensin converting enzyme (ACE), thermolysin inhibitors (THER) and thrombin inhibitors (THR)) widely used as benchmarks in the evaluation of several methodologies. One to four variable QSAR models based on multiple linear regression were developed for each compound dataset following the original division into training and test sets. The results obtained reveal that the novel cutoff procedures yield superior performances relative to those of the QuBiLS-MIDAS KA-MDs in the prediction of the biological activities considered. From the results achieved, it can be suggested that the proposed N-tuple topological/geometric cutoffs constitute a relevant criteria for generating MDs codifying particular atomic relations, ultimately useful in enhancing the modelling capacity of the QuBiLS-MIDAS 3D-MDs.

Original languageEnglish
Pages (from-to)949-975
Number of pages27
JournalSAR and QSAR in Environmental Research
Issue number12
StatePublished - 1 Dec 2016


  • 3D molecular descriptors
  • N-tuple molecular cutoffs
  • QSAR


Dive into the research topics of 'N-tuple topological/geometric cutoffs for 3D N-linear algebraic molecular codifications: variability, linear independence and QSAR analysis'. Together they form a unique fingerprint.

Cite this