TY - JOUR

T1 - Shannon's, mutual, conditional and joint entropy information indices

T2 - Generalization of global indices defined from local vertex invariants

AU - Barigye, Stephen J.

AU - Marrero-Ponce, Yovani

AU - Martínez Santiago, Oscar

AU - Martínez López, Yoan

AU - Pérez-Giménez, Facundo

AU - Torrens, Francisco

PY - 2013

Y1 - 2013

N2 - A new mathematical approach is proposed in the definition of molecular descriptors (MDs) based on the application of information theory concepts. This approach stems from a new matrix representation of a molecular graph (G) which is derived from the generalization of an incidence matrix whose row entries correspond to connected subgraphs of a given G, and the calculation of the Shannon's entropy, the negentropy and the standardized information content, plus for the first time, the mutual, conditional and joint entropy-based MDs associated with G. We also define strategies that generalize the definition of global or local invariants from atomic contributions (local vertex invariants, LOVIs), introducing related metrics (norms), means and statistical invariants. These invariants are applied to a vector whose components express the atomic information content calculated using the Shannon's, mutual, conditional and joint entropy-based atomic information indices. The novel information indices (IFIs) are implemented in the program TOMOCOMD-CARDD. A principal component analysis reveals that the novel IFIs are capable of capturing structural information not codified by IFIs implemented in the software DRAGON. A comparative study of the different parameters (e.g. subgraph orders and/or types, invariants and class of MDs) used in the definition of these IFIs reveals several interesting results. The mutual entropy-based indices give the best correlation results in modeling of a physicochemical property, namely the partition coefficient of the 34 derivatives of 2-furylethylenes, among the classes of indices investigated in this study. In a comparison with classical MDs it is demonstrated that the new IFIs give good results for various QSPR models.

AB - A new mathematical approach is proposed in the definition of molecular descriptors (MDs) based on the application of information theory concepts. This approach stems from a new matrix representation of a molecular graph (G) which is derived from the generalization of an incidence matrix whose row entries correspond to connected subgraphs of a given G, and the calculation of the Shannon's entropy, the negentropy and the standardized information content, plus for the first time, the mutual, conditional and joint entropy-based MDs associated with G. We also define strategies that generalize the definition of global or local invariants from atomic contributions (local vertex invariants, LOVIs), introducing related metrics (norms), means and statistical invariants. These invariants are applied to a vector whose components express the atomic information content calculated using the Shannon's, mutual, conditional and joint entropy-based atomic information indices. The novel information indices (IFIs) are implemented in the program TOMOCOMD-CARDD. A principal component analysis reveals that the novel IFIs are capable of capturing structural information not codified by IFIs implemented in the software DRAGON. A comparative study of the different parameters (e.g. subgraph orders and/or types, invariants and class of MDs) used in the definition of these IFIs reveals several interesting results. The mutual entropy-based indices give the best correlation results in modeling of a physicochemical property, namely the partition coefficient of the 34 derivatives of 2-furylethylenes, among the classes of indices investigated in this study. In a comparison with classical MDs it is demonstrated that the new IFIs give good results for various QSPR models.

KW - Conditional entropy

KW - Frequency matrix

KW - Joint entropy

KW - Mutual entropy

KW - Principal component analysis

KW - QSPR

KW - Shannon's entropy

KW - Structural descriptor

KW - Subgraph

UR - http://www.scopus.com/inward/record.url?scp=84888047264&partnerID=8YFLogxK

U2 - 10.2174/1573409911309020003

DO - 10.2174/1573409911309020003

M3 - Artículo

C2 - 23700990

AN - SCOPUS:84888047264

SN - 1573-4099

VL - 9

SP - 164

EP - 183

JO - Current Computer-Aided Drug Design

JF - Current Computer-Aided Drug Design

IS - 2

ER -