Ir directamente a la navegación principal Ir directamente a la búsqueda Ir directamente al contenido principal

Improving Dysarthria Assessment Through Voice Conversion in Low-Resource Settings

  • Emily Chimbo*
  • , Felipe Grijalva
  • , Karen Rosero
  • *Autor correspondiente de este trabajo
  • Universidad San Francisco de Quito
  • Carnegie Mellon University

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

The development of automatic dysarthria assessment systems is often limited by the scarcity of labeled pathological speech, particularly in low-resource clinical environments. In this work, we explore the use of generative voice conversion to address this bottleneck. We propose a pipeline based on speech enhancement and voice conversion to transfer dysarthric vocal traits onto healthy utterances to generate realistic pathological speech. A total of 4,082 synthetic samples were generated and enhanced to improve quality while preserving pathological prosody. A dual-branch CNN trained under four experimental setups showed that combining real and synthetic data improved the classification accuracy from 77.52% to 98.36%, while using only enhanced synthetic data reached 97.10%. These results support the use of voice conversion to expand clinical datasets and reduce dependence on real patient recordings.

Idioma originalInglés
Título de la publicación alojadaETCM 2025 - 9th Ecuador Technical Chapters Meeting
EditorialInstitute of Electrical and Electronics Engineers Inc.
ISBN (versión digital)9798331552640
DOI
EstadoPublicada - 2025
Evento9th Ecuador Technical Chapters Meeting, ETCM 2025 - Quito, Ecuador
Duración: 21 oct. 202524 oct. 2025

Serie de la publicación

NombreETCM 2025 - 9th Ecuador Technical Chapters Meeting

Conferencia

Conferencia9th Ecuador Technical Chapters Meeting, ETCM 2025
País/TerritorioEcuador
CiudadQuito
Período21/10/2524/10/25

Huella

Profundice en los temas de investigación de 'Improving Dysarthria Assessment Through Voice Conversion in Low-Resource Settings'. En conjunto forman una huella única.

Citar esto