Ir directamente a la navegación principal Ir directamente a la búsqueda Ir directamente al contenido principal

Unveiling aroma: A machine learning approach to modeling aroma profiles from chemical components

  • David Flores
  • , Carolina Lanas
  • , Gabriel Velez-Malo
  • , Rafael Amaya-Gómez
  • , Nicolas Ratkovich
  • , Maria Baldeon-Calisto*
  • *Autor correspondiente de este trabajo
  • Universidad San Francisco de Quito
  • Universidad de los Andes Colombia

Producción científica: Contribución a una revistaArtículorevisión exhaustiva

Resumen

Accurately predicting the aroma of a molecule based on its chemical structure is a promising direction for advancing the development of flavors and fragrances in the food, cosmetic, and pharmaceutical industries. This study explores the potential of machine learning (ML) techniques to predict aroma descriptors from molecules encoded in SMILES format. A newly curated dataset was developed, comprising 5855 molecules annotated with 44 standardized odor descriptors. Three models were evaluated on this dataset, namely Random Forest, XGBoost, and TabNet. Molecular feature extraction was conducted using Morgan Fingerprints, and the influence of fingerprint radius on predictive performance was assessed. To address the significant label imbalance, the MLSMOTE data augmentation strategy was also evaluated. Among the tested models, XGBoost trained with a fingerprint radius of 2 demonstrated the highest predictive performance. It achieved a mean AUROC of 0.92, a precision of 0.54, a recall of 0.63, and an F1-score of 0.58. In contrast, TabNet consistently underperformed across all configurations, with statistically lower F1 scores than the other two ML models. External validation on an independent cacao-specific molecular dataset further demonstrated the robustness of the XGBoost model, which achieved a low Hamming Loss, a moderately high AUROC, and a moderate Recall, indicating effective identification of true-positive aroma descriptors. These findings highlight the potential of ML models, particularly XGBoost, as a reliable, data-driven approach for predicting sensory attributes from molecular structure. Moreover, this approach offers a promising alternative for accelerating aroma design and reducing subjectivity in product development across sensory-driven industries.

Idioma originalInglés
Número de artículo101962
PublicaciónApplied Food Research
Volumen6
N.º1
DOI
EstadoPublicada - jun. 2026

Huella

Profundice en los temas de investigación de 'Unveiling aroma: A machine learning approach to modeling aroma profiles from chemical components'. En conjunto forman una huella única.

Citar esto