TY - GEN
T1 - Toward Non-invasive Speech Evaluation
T2 - 8th IEEE Colombian Conference on Applications of Computational Intelligence, ColCACI 2025
AU - Loza, Malena
AU - Chamorro, David
AU - Grijalva, Felipe
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY - 2026
Y1 - 2026
N2 - Cleft lip and/or palate (CLP) is a prevalent congenital craniofacial anomaly that impairs normal speech articulation. Conventional clinical assessments such as nasoendoscopy and videofluoroscopy, while accurate, are invasive, costly, and require specialized expertise. This study introduces a non-invasive machine learning framework to distinguish between speakers with and without CLP using only acoustic features. Voice recordings from 100 participants (60 controls, 40 CLP) were collected following a standardized Spanish protocol targeting phonemes frequently affected by cleft conditions, including /k/ and /g/. Acoustic embeddings were extracted using the Spanish Wav2Vec 2.0 model, generating 772 features per sample. Supervised models like Support Vector Machines (SVM) and feedforward Neural Networks (NN) and unsupervised methods Gaussian Mixture Models (GMM), K-Means, and Spectral Clustering were implemented and compared. The SVM achieved the highest performance (F1-score = 0.93), followed by the NN (F1-score = 0.91) with improved sensitivity to CLP speech. Among unsupervised approaches, K-Means and GMM outperformed Spectral Clustering, particularly for the /k/ and /g/ phonemes. The /k/ phoneme yielded the highest discrimination (Accuracy = 0.86; ARI = 0.53), followed by /g/ (Accuracy = 0.73; ARI = 0.19). These findings demonstrate that acoustic embeddings effectively capture articulatory features distinctive of CLP, highlighting the discriminative relevance of velar stops /k/ and /g/. The proposed approach offers a scalable, non-invasive, and patient-friendly solution to support automated speech assessment and monitoring, particularly valuable in low-resource clinical settings.
AB - Cleft lip and/or palate (CLP) is a prevalent congenital craniofacial anomaly that impairs normal speech articulation. Conventional clinical assessments such as nasoendoscopy and videofluoroscopy, while accurate, are invasive, costly, and require specialized expertise. This study introduces a non-invasive machine learning framework to distinguish between speakers with and without CLP using only acoustic features. Voice recordings from 100 participants (60 controls, 40 CLP) were collected following a standardized Spanish protocol targeting phonemes frequently affected by cleft conditions, including /k/ and /g/. Acoustic embeddings were extracted using the Spanish Wav2Vec 2.0 model, generating 772 features per sample. Supervised models like Support Vector Machines (SVM) and feedforward Neural Networks (NN) and unsupervised methods Gaussian Mixture Models (GMM), K-Means, and Spectral Clustering were implemented and compared. The SVM achieved the highest performance (F1-score = 0.93), followed by the NN (F1-score = 0.91) with improved sensitivity to CLP speech. Among unsupervised approaches, K-Means and GMM outperformed Spectral Clustering, particularly for the /k/ and /g/ phonemes. The /k/ phoneme yielded the highest discrimination (Accuracy = 0.86; ARI = 0.53), followed by /g/ (Accuracy = 0.73; ARI = 0.19). These findings demonstrate that acoustic embeddings effectively capture articulatory features distinctive of CLP, highlighting the discriminative relevance of velar stops /k/ and /g/. The proposed approach offers a scalable, non-invasive, and patient-friendly solution to support automated speech assessment and monitoring, particularly valuable in low-resource clinical settings.
UR - https://www.scopus.com/pages/publications/105037646977
U2 - 10.1007/978-3-032-20900-9_6
DO - 10.1007/978-3-032-20900-9_6
M3 - Contribución a la conferencia
AN - SCOPUS:105037646977
SN - 9783032208996
T3 - Communications in Computer and Information Science
SP - 68
EP - 85
BT - Applications of Computational Intelligence - 8th IEEE Colombian Conference, ColCACI 2025, Revised Selected Papers
A2 - Orjuela-Cañón, Alvaro David
A2 - Lopez, Jesus A
A2 - Suarez, Oscar J
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 27 August 2025 through 29 August 2025
ER -