TY - JOUR
T1 - Novel similarity measures for the effective and efficient retrieval of pharmacological datasets
AU - Borroto, Oscar Miguel Rivera
AU - Díaz, Yoandy Hernández
AU - De La Vega, José Manuel García
AU - Del Corazón Grau Ábalo, Ricardo
AU - Ponce, Yovani Marrero
PY - 2011/1
Y1 - 2011/1
N2 - Similarity searching is an important facility in modern chemical information management systems to accede the rich information contained in currently enormous chemical repositories. Basically, given a molecular representation, a similarity measure, and a matching algorithm, the technique output returns an ordered list of dataset molecules in decreasing order of similarity with respect to a query or reference molecule specified by the user. As a consequence, researchers have put their interest in molecular representations and similarity measures performance. However, their studies have been predominantly focused in binary representations and the corresponding resemblance measures, and little work has been done taking into account other types of numerical description. Also, Machine Learning techniques have been applied for descriptor selection, though not consistently with the neighbourhood principle. These precedents, together with the need of new methods suitable for each chemical context, constitute the motivation for this work. It comprises the computational implementation, in the Java environment, and comparison of two novel measures of similarity to other proximity models established in the literature at effectively retrieving eight pharmacological datasets from Medicinal Chemistry, represented by machine learning-selected real descriptors, and some efficient matching algorithm.
AB - Similarity searching is an important facility in modern chemical information management systems to accede the rich information contained in currently enormous chemical repositories. Basically, given a molecular representation, a similarity measure, and a matching algorithm, the technique output returns an ordered list of dataset molecules in decreasing order of similarity with respect to a query or reference molecule specified by the user. As a consequence, researchers have put their interest in molecular representations and similarity measures performance. However, their studies have been predominantly focused in binary representations and the corresponding resemblance measures, and little work has been done taking into account other types of numerical description. Also, Machine Learning techniques have been applied for descriptor selection, though not consistently with the neighbourhood principle. These precedents, together with the need of new methods suitable for each chemical context, constitute the motivation for this work. It comprises the computational implementation, in the Java environment, and comparison of two novel measures of similarity to other proximity models established in the literature at effectively retrieving eight pharmacological datasets from Medicinal Chemistry, represented by machine learning-selected real descriptors, and some efficient matching algorithm.
KW - Machine learning descriptor selection
KW - Medicinal chemistry datasets
KW - Nearest neighbours
KW - Similarity measures
KW - Similarity search
UR - http://www.scopus.com/inward/record.url?scp=79961055137&partnerID=8YFLogxK
M3 - Artículo
AN - SCOPUS:79961055137
SN - 0001-9704
VL - 68
SP - 50
EP - 56
JO - Afinidad
JF - Afinidad
IS - 551
ER -