Skip to main navigation Skip to search Skip to main content

Unveiling aroma: A machine learning approach to modeling aroma profiles from chemical components

  • David Flores
  • , Carolina Lanas
  • , Gabriel Velez-Malo
  • , Rafael Amaya-Gómez
  • , Nicolas Ratkovich
  • , Maria Baldeon-Calisto*
  • *Corresponding author for this work
  • Universidad San Francisco de Quito
  • Universidad de los Andes Colombia

Research output: Contribution to journalArticlepeer-review

Abstract

Accurately predicting the aroma of a molecule based on its chemical structure is a promising direction for advancing the development of flavors and fragrances in the food, cosmetic, and pharmaceutical industries. This study explores the potential of machine learning (ML) techniques to predict aroma descriptors from molecules encoded in SMILES format. A newly curated dataset was developed, comprising 5855 molecules annotated with 44 standardized odor descriptors. Three models were evaluated on this dataset, namely Random Forest, XGBoost, and TabNet. Molecular feature extraction was conducted using Morgan Fingerprints, and the influence of fingerprint radius on predictive performance was assessed. To address the significant label imbalance, the MLSMOTE data augmentation strategy was also evaluated. Among the tested models, XGBoost trained with a fingerprint radius of 2 demonstrated the highest predictive performance. It achieved a mean AUROC of 0.92, a precision of 0.54, a recall of 0.63, and an F1-score of 0.58. In contrast, TabNet consistently underperformed across all configurations, with statistically lower F1 scores than the other two ML models. External validation on an independent cacao-specific molecular dataset further demonstrated the robustness of the XGBoost model, which achieved a low Hamming Loss, a moderately high AUROC, and a moderate Recall, indicating effective identification of true-positive aroma descriptors. These findings highlight the potential of ML models, particularly XGBoost, as a reliable, data-driven approach for predicting sensory attributes from molecular structure. Moreover, this approach offers a promising alternative for accelerating aroma design and reducing subjectivity in product development across sensory-driven industries.

Original languageEnglish
Article number101962
JournalApplied Food Research
Volume6
Issue number1
DOIs
StatePublished - Jun 2026

Keywords

  • Aroma prediction
  • Deep learning
  • Isomeric SMILES
  • Machine Learning
  • Olfactory descriptors
  • QSOR
  • Quantitative structure-odor relationship
  • Sensory profile prediction

Fingerprint

Dive into the research topics of 'Unveiling aroma: A machine learning approach to modeling aroma profiles from chemical components'. Together they form a unique fingerprint.

Cite this