TY - JOUR
T1 - Unveiling aroma
T2 - A machine learning approach to modeling aroma profiles from chemical components
AU - Flores, David
AU - Lanas, Carolina
AU - Velez-Malo, Gabriel
AU - Amaya-Gómez, Rafael
AU - Ratkovich, Nicolas
AU - Baldeon-Calisto, Maria
N1 - Publisher Copyright:
Copyright © 2026. Published by Elsevier B.V.
PY - 2026/6
Y1 - 2026/6
N2 - Accurately predicting the aroma of a molecule based on its chemical structure is a promising direction for advancing the development of flavors and fragrances in the food, cosmetic, and pharmaceutical industries. This study explores the potential of machine learning (ML) techniques to predict aroma descriptors from molecules encoded in SMILES format. A newly curated dataset was developed, comprising 5855 molecules annotated with 44 standardized odor descriptors. Three models were evaluated on this dataset, namely Random Forest, XGBoost, and TabNet. Molecular feature extraction was conducted using Morgan Fingerprints, and the influence of fingerprint radius on predictive performance was assessed. To address the significant label imbalance, the MLSMOTE data augmentation strategy was also evaluated. Among the tested models, XGBoost trained with a fingerprint radius of 2 demonstrated the highest predictive performance. It achieved a mean AUROC of 0.92, a precision of 0.54, a recall of 0.63, and an F1-score of 0.58. In contrast, TabNet consistently underperformed across all configurations, with statistically lower F1 scores than the other two ML models. External validation on an independent cacao-specific molecular dataset further demonstrated the robustness of the XGBoost model, which achieved a low Hamming Loss, a moderately high AUROC, and a moderate Recall, indicating effective identification of true-positive aroma descriptors. These findings highlight the potential of ML models, particularly XGBoost, as a reliable, data-driven approach for predicting sensory attributes from molecular structure. Moreover, this approach offers a promising alternative for accelerating aroma design and reducing subjectivity in product development across sensory-driven industries.
AB - Accurately predicting the aroma of a molecule based on its chemical structure is a promising direction for advancing the development of flavors and fragrances in the food, cosmetic, and pharmaceutical industries. This study explores the potential of machine learning (ML) techniques to predict aroma descriptors from molecules encoded in SMILES format. A newly curated dataset was developed, comprising 5855 molecules annotated with 44 standardized odor descriptors. Three models were evaluated on this dataset, namely Random Forest, XGBoost, and TabNet. Molecular feature extraction was conducted using Morgan Fingerprints, and the influence of fingerprint radius on predictive performance was assessed. To address the significant label imbalance, the MLSMOTE data augmentation strategy was also evaluated. Among the tested models, XGBoost trained with a fingerprint radius of 2 demonstrated the highest predictive performance. It achieved a mean AUROC of 0.92, a precision of 0.54, a recall of 0.63, and an F1-score of 0.58. In contrast, TabNet consistently underperformed across all configurations, with statistically lower F1 scores than the other two ML models. External validation on an independent cacao-specific molecular dataset further demonstrated the robustness of the XGBoost model, which achieved a low Hamming Loss, a moderately high AUROC, and a moderate Recall, indicating effective identification of true-positive aroma descriptors. These findings highlight the potential of ML models, particularly XGBoost, as a reliable, data-driven approach for predicting sensory attributes from molecular structure. Moreover, this approach offers a promising alternative for accelerating aroma design and reducing subjectivity in product development across sensory-driven industries.
KW - Aroma prediction
KW - Deep learning
KW - Isomeric SMILES
KW - Machine Learning
KW - Olfactory descriptors
KW - QSOR
KW - Quantitative structure-odor relationship
KW - Sensory profile prediction
UR - https://www.scopus.com/pages/publications/105035593703
U2 - 10.1016/j.afres.2026.101962
DO - 10.1016/j.afres.2026.101962
M3 - Artículo
AN - SCOPUS:105035593703
SN - 2772-5022
VL - 6
JO - Applied Food Research
JF - Applied Food Research
IS - 1
M1 - 101962
ER -