Digital Inclusion and Culture: Training LLaMA-2 to Empower Kichwa Communities

James Leon, Daniel Riofrio, Felipe Grijalva, Kuymi Tambaco

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

Language serves as a fundamental thread in humanity's social tapestry, intertwining individuality with collective identity. More than a mere instrument, it had been culture, tradition, and most significantly, the very definition of oneself. Historical narratives, often penned by a 'privileged' minority, marginalize linguistic 'others,' echoing a dissonance that threatens lesser-spoken languages.In this era of technological renaissance, Natural Language Processing (NLP) allows meanings to be drawn from human language through machine computation, more specifically by means of innovative neural network architectures such as those that use Transformers. However, digital scarcity of resources for those languages becomes a major obstacle to their inclusion in the global digital dialogues. We present URKU, a model meticulously fine-Tuned for Kichwa, assessed through a rigorously designed test set informed by linguistic experts' profound insights. URKU uses Low Rank Adaptation (LoRA) techniques applied over Meta's open-source large language model, LLaMA-2, and specializes on the Ecuadorian Kichwa. We contrast URKU's potential abilities against OpenAI's availability and diversity-promoting custom-GPT feature, pointing at the capacity and possibility of LoRA to foster linguistic diversity.This study demonstrates the feasible integration of minority languages into cutting-edge AI, showcasing a technical leap towards digital inclusivity. We added emphasis on the importance of language in supporting active participation and the development of new knowledge bases that are rather critical for the democratization of information and empowerment of these communities. Our efforts advocate for a digital future that honors every voice, ensuring equitable representation beyond dominant narratives.

Idioma originalInglés
Título de la publicación alojada2024 10th International Conference on eDemocracy and eGovernment, ICEDEG 2024
EditoresLuis Teran, Luis Teran, Jhonny Pincay, Jhonny Pincay, Carmen Vaca, Daniel Riofrio
EditorialInstitute of Electrical and Electronics Engineers Inc.
Edición2024
ISBN (versión digital)9798350365535
DOI
EstadoPublicada - 2024
Evento10th International Conference on eDemocracy and eGovernment, ICEDEG 2024 - Lucerne, Suiza
Duración: 24 jun. 202426 jun. 2024

Conferencia

Conferencia10th International Conference on eDemocracy and eGovernment, ICEDEG 2024
País/TerritorioSuiza
CiudadLucerne
Período24/06/2426/06/24

Huella

Profundice en los temas de investigación de 'Digital Inclusion and Culture: Training LLaMA-2 to Empower Kichwa Communities'. En conjunto forman una huella única.

Citar esto