Skip to main navigation Skip to search Skip to main content

Fine-Tuning Wav2Vec2 for Low-Resource Kichwa Automatic Speech Recognition

  • Universidad San Francisco de Quito
  • Carnegie Mellon University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In recent years, advancements in artificial intelligence (AI) have significantly accelerated the development of natural language processing and automatic speech recognition (ASR) systems for high-resource languages, raising concerns about the marginalization of ancestral and underrepresented languages. In this context, this work explores the fine-tuning of the Wav2Vec 2.0 model, developed by Meta AI, for ASR in Kichwa-a low-resource language spoken in the Ecuadorian Andes. The training process utilized two datasets totaling approximately 8 hours of audio, segmented into clips ranging from 1.5 to 5 seconds, with manually aligned transcriptions created using ELAN software. Fine-tuning was performed using the Connectionist Temporal Classification (CTC) loss function. After multiple experiments, a two-tailed Wilcoxon signed-rank test revealed no statistically significant improvement when applying SpecAugment. The best-performing model, trained without data augmentation, achieved promising results on the test set: a Word Error Rate (WER) of 0.262, a Character Error Rate (CER) of 0.120, and a Match Error Rate (MER) of 0.401. These findings d emonstrate the viability of adapting pre-trained self-supervised models to low-resource settings and underscore the potential of ASR technologies to support greater linguistic inclusivity in artificial intelligence.

Original languageEnglish
Title of host publicationETCM 2025 - 9th Ecuador Technical Chapters Meeting
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331552640
DOIs
StatePublished - 2025
Event9th Ecuador Technical Chapters Meeting, ETCM 2025 - Quito, Ecuador
Duration: 21 Oct 202524 Oct 2025

Publication series

NameETCM 2025 - 9th Ecuador Technical Chapters Meeting

Conference

Conference9th Ecuador Technical Chapters Meeting, ETCM 2025
Country/TerritoryEcuador
CityQuito
Period21/10/2524/10/25

Keywords

  • Audio
  • Automatic Speech Recognition
  • Connectionist Temporal Classification
  • Deep Learning
  • FineTuning
  • Kichwa

Fingerprint

Dive into the research topics of 'Fine-Tuning Wav2Vec2 for Low-Resource Kichwa Automatic Speech Recognition'. Together they form a unique fingerprint.

Cite this