Skip to main navigation Skip to search Skip to main content

Toward Non-invasive Speech Evaluation: Supervised and Unsupervised AI Methods for Detecting Cleft Lip from Audio

  • Malena Loza*
  • , David Chamorro
  • , Felipe Grijalva
  • *Corresponding author for this work
  • Universidad Rey Juan Carlos
  • Universidad San Francisco de Quito

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Cleft lip and/or palate (CLP) is a prevalent congenital craniofacial anomaly that impairs normal speech articulation. Conventional clinical assessments such as nasoendoscopy and videofluoroscopy, while accurate, are invasive, costly, and require specialized expertise. This study introduces a non-invasive machine learning framework to distinguish between speakers with and without CLP using only acoustic features. Voice recordings from 100 participants (60 controls, 40 CLP) were collected following a standardized Spanish protocol targeting phonemes frequently affected by cleft conditions, including /k/ and /g/. Acoustic embeddings were extracted using the Spanish Wav2Vec 2.0 model, generating 772 features per sample. Supervised models like Support Vector Machines (SVM) and feedforward Neural Networks (NN) and unsupervised methods Gaussian Mixture Models (GMM), K-Means, and Spectral Clustering were implemented and compared. The SVM achieved the highest performance (F1-score = 0.93), followed by the NN (F1-score = 0.91) with improved sensitivity to CLP speech. Among unsupervised approaches, K-Means and GMM outperformed Spectral Clustering, particularly for the /k/ and /g/ phonemes. The /k/ phoneme yielded the highest discrimination (Accuracy = 0.86; ARI = 0.53), followed by /g/ (Accuracy = 0.73; ARI = 0.19). These findings demonstrate that acoustic embeddings effectively capture articulatory features distinctive of CLP, highlighting the discriminative relevance of velar stops /k/ and /g/. The proposed approach offers a scalable, non-invasive, and patient-friendly solution to support automated speech assessment and monitoring, particularly valuable in low-resource clinical settings.

Original languageEnglish
Title of host publicationApplications of Computational Intelligence - 8th IEEE Colombian Conference, ColCACI 2025, Revised Selected Papers
EditorsAlvaro David Orjuela-Cañón, Jesus A Lopez, Oscar J Suarez
PublisherSpringer Science and Business Media Deutschland GmbH
Pages68-85
Number of pages18
ISBN (Print)9783032208996
DOIs
StatePublished - 2026
Event8th IEEE Colombian Conference on Applications of Computational Intelligence, ColCACI 2025 - Armenia, Colombia
Duration: 27 Aug 202529 Aug 2025

Publication series

NameCommunications in Computer and Information Science
Volume2846 CCIS
ISSN (Print)1865-0929
ISSN (Electronic)1865-0937

Conference

Conference8th IEEE Colombian Conference on Applications of Computational Intelligence, ColCACI 2025
Country/TerritoryColombia
CityArmenia
Period27/08/2529/08/25

Fingerprint

Dive into the research topics of 'Toward Non-invasive Speech Evaluation: Supervised and Unsupervised AI Methods for Detecting Cleft Lip from Audio'. Together they form a unique fingerprint.

Cite this