Ir directamente a la navegación principal Ir directamente a la búsqueda Ir directamente al contenido principal

Building a Generalized Framework for Analyzing Public Procurement Data from the Kapak Database

  • Universidad San Francisco de Quito

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

Resumen

Public procurement systems generate large, complex datasets that can reveal corruption risks; however, analyzing these datasets is challenging due to unstructured formats, fragmented sources, and technical barriers. This paper introduced a data pipeline that streamlines the processing of procurement data from Ecuador's Official Public Procurement System (SOCE, for its acronym in Spanish), expanding on the Kapak project, which uses big data and data science to promote transparency. Specifically, Kapak implements a web crawler that periodically collects extensive procurement data from Ecuador's public procurement website SOCE. Nevertheless, its raw format-comprising base64encoded USHAY files, fragmented documents, and scattered JSON files-hinders effective analysis. To address this, our pipeline automates decoding, file reconstruction, and dataset consolidation, thereby ensuring that the data is ready for analysis. Subsequently, we evaluated, the pipeline on Reverse Electronic Auction (REA) documents, using TF-IDF and cosine similarity to detect patterns among high-risk procurement processes. The analysis specifically focused on technical specifications, bid documents, and stakeholder Question and Answer (Q&A) records. Overall, the pipeline offers a generalizable, modular framework that shifts focus from preprocessing to analysis. In addition, it supports integration with advanced Natural Language Processing (NLP) models, making it a valuable tool for corruption detection and public procurement oversight.

Idioma originalInglés
Título de la publicación alojadaETCM 2025 - 9th Ecuador Technical Chapters Meeting
EditorialInstitute of Electrical and Electronics Engineers Inc.
ISBN (versión digital)9798331552640
DOI
EstadoPublicada - 2025
Evento9th Ecuador Technical Chapters Meeting, ETCM 2025 - Quito, Ecuador
Duración: 21 oct. 202524 oct. 2025

Serie de la publicación

NombreETCM 2025 - 9th Ecuador Technical Chapters Meeting

Conferencia

Conferencia9th Ecuador Technical Chapters Meeting, ETCM 2025
País/TerritorioEcuador
CiudadQuito
Período21/10/2524/10/25

Huella

Profundice en los temas de investigación de 'Building a Generalized Framework for Analyzing Public Procurement Data from the Kapak Database'. En conjunto forman una huella única.

Citar esto