TY - JOUR
T1 - Towards better BBB passage prediction using an extensive and curated data set
AU - Brito-Sánchez, Yoan
AU - Marrero-Ponce, Yovani
AU - Barigye, Stephen J.
AU - Yaber-Goenaga, Iván
AU - Morell Pérez, Carlos
AU - Le-Thi-Thu, Huong
AU - Cherkasov, Artem
N1 - Publisher Copyright:
© 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
PY - 2015/5/1
Y1 - 2015/5/1
N2 - In the present report, the challenging task of drug delivery across the blood-brain barrier (BBB) is addressed via a computational approach. The BBB passage was modeled using classification and regression schemes on a novel extensive and curated data set (the largest to the best of our knowledge) in terms of log BB. Prior to the model development, steps of data analysis that comprise chemical data curation, structural, cutoff and cluster analysis (CA) were conducted. Linear Discriminant Analysis (LDA) and Multiple Linear Regression (MLR) were used to fit classification and correlation functions. The best LDA-based model showed overall accuracies over 85% and 83% for the training and test sets, respectively. Also a MLR-based model with acceptable explanation of more than 69% of the variance in the experimental log BB was developed. A brief and general interpretation of proposed models allowed the estimation on how 'near' our computational approach is to the factors that determine the passage of molecules through the BBB. In a final effort some popular and powerful Machine Learning methods were considered. Comparable or similar performance was observed respect to the simpler linear techniques. Most of the compounds with anomalous behavior were put aside into a set denoted as controversial set and discussion regarding to these compounds is provided. Finally, our results were compared with methodologies previously reported in the literature showing comparable to better results. The results could represent useful tools available and reproducible by all scientific community in the early stages of neuropharmaceutical drug discovery/development projects.
AB - In the present report, the challenging task of drug delivery across the blood-brain barrier (BBB) is addressed via a computational approach. The BBB passage was modeled using classification and regression schemes on a novel extensive and curated data set (the largest to the best of our knowledge) in terms of log BB. Prior to the model development, steps of data analysis that comprise chemical data curation, structural, cutoff and cluster analysis (CA) were conducted. Linear Discriminant Analysis (LDA) and Multiple Linear Regression (MLR) were used to fit classification and correlation functions. The best LDA-based model showed overall accuracies over 85% and 83% for the training and test sets, respectively. Also a MLR-based model with acceptable explanation of more than 69% of the variance in the experimental log BB was developed. A brief and general interpretation of proposed models allowed the estimation on how 'near' our computational approach is to the factors that determine the passage of molecules through the BBB. In a final effort some popular and powerful Machine Learning methods were considered. Comparable or similar performance was observed respect to the simpler linear techniques. Most of the compounds with anomalous behavior were put aside into a set denoted as controversial set and discussion regarding to these compounds is provided. Finally, our results were compared with methodologies previously reported in the literature showing comparable to better results. The results could represent useful tools available and reproducible by all scientific community in the early stages of neuropharmaceutical drug discovery/development projects.
KW - BBB endpoint
KW - Blood£brain barrier
KW - Dragon descriptor
KW - Linear discriminant analysis
KW - Multiple linear regression
KW - P-glycoprotein
KW - Quantitative structure pharmacokinetic (property) relationship
UR - http://www.scopus.com/inward/record.url?scp=84930640106&partnerID=8YFLogxK
U2 - 10.1002/minf.201400118
DO - 10.1002/minf.201400118
M3 - Artículo
C2 - 27490276
AN - SCOPUS:84930640106
SN - 1868-1743
VL - 34
SP - 308
EP - 330
JO - Molecular Informatics
JF - Molecular Informatics
IS - 5
ER -