Skip to main navigation Skip to search Skip to main content

Digital Inclusion and Culture: Training LLaMA-2 to Empower Kichwa Communities

  • Universidad San Francisco de Quito

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Language serves as a fundamental thread in humanity's social tapestry, intertwining individuality with collective identity. More than a mere instrument, it had been culture, tradition, and most significantly, the very definition of oneself. Historical narratives, often penned by a 'privileged' minority, marginalize linguistic 'others,' echoing a dissonance that threatens lesser-spoken languages.In this era of technological renaissance, Natural Language Processing (NLP) allows meanings to be drawn from human language through machine computation, more specifically by means of innovative neural network architectures such as those that use Transformers. However, digital scarcity of resources for those languages becomes a major obstacle to their inclusion in the global digital dialogues. We present URKU, a model meticulously fine-Tuned for Kichwa, assessed through a rigorously designed test set informed by linguistic experts' profound insights. URKU uses Low Rank Adaptation (LoRA) techniques applied over Meta's open-source large language model, LLaMA-2, and specializes on the Ecuadorian Kichwa. We contrast URKU's potential abilities against OpenAI's availability and diversity-promoting custom-GPT feature, pointing at the capacity and possibility of LoRA to foster linguistic diversity.This study demonstrates the feasible integration of minority languages into cutting-edge AI, showcasing a technical leap towards digital inclusivity. We added emphasis on the importance of language in supporting active participation and the development of new knowledge bases that are rather critical for the democratization of information and empowerment of these communities. Our efforts advocate for a digital future that honors every voice, ensuring equitable representation beyond dominant narratives.

Original languageEnglish
Title of host publication2024 10th International Conference on eDemocracy and eGovernment, ICEDEG 2024
EditorsLuis Teran, Luis Teran, Jhonny Pincay, Jhonny Pincay, Carmen Vaca, Daniel Riofrio
PublisherInstitute of Electrical and Electronics Engineers Inc.
Edition2024
ISBN (Electronic)9798350365535
DOIs
StatePublished - 2024
Event10th International Conference on eDemocracy and eGovernment, ICEDEG 2024 - Lucerne, Switzerland
Duration: 24 Jun 202426 Jun 2024

Conference

Conference10th International Conference on eDemocracy and eGovernment, ICEDEG 2024
Country/TerritorySwitzerland
CityLucerne
Period24/06/2426/06/24

Keywords

  • LLM
  • LLaMA-2
  • LoRA
  • language and society
  • representation
  • under-resourced languages

Fingerprint

Dive into the research topics of 'Digital Inclusion and Culture: Training LLaMA-2 to Empower Kichwa Communities'. Together they form a unique fingerprint.

Cite this