Ir directamente a la navegación principal Ir directamente a la búsqueda Ir directamente al contenido principal

Toward Non-invasive Speech Evaluation: Supervised and Unsupervised AI Methods for Detecting Cleft Lip from Audio

  • Malena Loza*
  • , David Chamorro
  • , Felipe Grijalva
  • *Autor correspondiente de este trabajo
  • Universidad Rey Juan Carlos
  • Universidad San Francisco de Quito

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

Cleft lip and/or palate (CLP) is a prevalent congenital craniofacial anomaly that impairs normal speech articulation. Conventional clinical assessments such as nasoendoscopy and videofluoroscopy, while accurate, are invasive, costly, and require specialized expertise. This study introduces a non-invasive machine learning framework to distinguish between speakers with and without CLP using only acoustic features. Voice recordings from 100 participants (60 controls, 40 CLP) were collected following a standardized Spanish protocol targeting phonemes frequently affected by cleft conditions, including /k/ and /g/. Acoustic embeddings were extracted using the Spanish Wav2Vec 2.0 model, generating 772 features per sample. Supervised models like Support Vector Machines (SVM) and feedforward Neural Networks (NN) and unsupervised methods Gaussian Mixture Models (GMM), K-Means, and Spectral Clustering were implemented and compared. The SVM achieved the highest performance (F1-score = 0.93), followed by the NN (F1-score = 0.91) with improved sensitivity to CLP speech. Among unsupervised approaches, K-Means and GMM outperformed Spectral Clustering, particularly for the /k/ and /g/ phonemes. The /k/ phoneme yielded the highest discrimination (Accuracy = 0.86; ARI = 0.53), followed by /g/ (Accuracy = 0.73; ARI = 0.19). These findings demonstrate that acoustic embeddings effectively capture articulatory features distinctive of CLP, highlighting the discriminative relevance of velar stops /k/ and /g/. The proposed approach offers a scalable, non-invasive, and patient-friendly solution to support automated speech assessment and monitoring, particularly valuable in low-resource clinical settings.

Idioma originalInglés
Título de la publicación alojadaApplications of Computational Intelligence - 8th IEEE Colombian Conference, ColCACI 2025, Revised Selected Papers
EditoresAlvaro David Orjuela-Cañón, Jesus A Lopez, Oscar J Suarez
EditorialSpringer Science and Business Media Deutschland GmbH
Páginas68-85
Número de páginas18
ISBN (versión impresa)9783032208996
DOI
EstadoPublicada - 2026
Evento8th IEEE Colombian Conference on Applications of Computational Intelligence, ColCACI 2025 - Armenia, Colombia
Duración: 27 ago. 202529 ago. 2025

Serie de la publicación

NombreCommunications in Computer and Information Science
Volumen2846 CCIS
ISSN (versión impresa)1865-0929
ISSN (versión digital)1865-0937

Conferencia

Conferencia8th IEEE Colombian Conference on Applications of Computational Intelligence, ColCACI 2025
País/TerritorioColombia
CiudadArmenia
Período27/08/2529/08/25

Huella

Profundice en los temas de investigación de 'Toward Non-invasive Speech Evaluation: Supervised and Unsupervised AI Methods for Detecting Cleft Lip from Audio'. En conjunto forman una huella única.

Citar esto