Abstract
Accurately predicting the aroma of a molecule based on its chemical structure is a promising direction for advancing the development of flavors and fragrances in the food, cosmetic, and pharmaceutical industries. This study explores the potential of machine learning (ML) techniques to predict aroma descriptors from molecules encoded in SMILES format. A newly curated dataset was developed, comprising 5855 molecules annotated with 44 standardized odor descriptors. Three models were evaluated on this dataset, namely Random Forest, XGBoost, and TabNet. Molecular feature extraction was conducted using Morgan Fingerprints, and the influence of fingerprint radius on predictive performance was assessed. To address the significant label imbalance, the MLSMOTE data augmentation strategy was also evaluated. Among the tested models, XGBoost trained with a fingerprint radius of 2 demonstrated the highest predictive performance. It achieved a mean AUROC of 0.92, a precision of 0.54, a recall of 0.63, and an F1-score of 0.58. In contrast, TabNet consistently underperformed across all configurations, with statistically lower F1 scores than the other two ML models. External validation on an independent cacao-specific molecular dataset further demonstrated the robustness of the XGBoost model, which achieved a low Hamming Loss, a moderately high AUROC, and a moderate Recall, indicating effective identification of true-positive aroma descriptors. These findings highlight the potential of ML models, particularly XGBoost, as a reliable, data-driven approach for predicting sensory attributes from molecular structure. Moreover, this approach offers a promising alternative for accelerating aroma design and reducing subjectivity in product development across sensory-driven industries.
| Original language | English |
|---|---|
| Article number | 101962 |
| Journal | Applied Food Research |
| Volume | 6 |
| Issue number | 1 |
| DOIs | |
| State | Published - Jun 2026 |
Keywords
- Aroma prediction
- Deep learning
- Isomeric SMILES
- Machine Learning
- Olfactory descriptors
- QSOR
- Quantitative structure-odor relationship
- Sensory profile prediction
Fingerprint
Dive into the research topics of 'Unveiling aroma: A machine learning approach to modeling aroma profiles from chemical components'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver