Skip to main navigation Skip to search Skip to main content

Building a Generalized Framework for Analyzing Public Procurement Data from the Kapak Database

  • Universidad San Francisco de Quito

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Public procurement systems generate large, complex datasets that can reveal corruption risks; however, analyzing these datasets is challenging due to unstructured formats, fragmented sources, and technical barriers. This paper introduced a data pipeline that streamlines the processing of procurement data from Ecuador's Official Public Procurement System (SOCE, for its acronym in Spanish), expanding on the Kapak project, which uses big data and data science to promote transparency. Specifically, Kapak implements a web crawler that periodically collects extensive procurement data from Ecuador's public procurement website SOCE. Nevertheless, its raw format-comprising base64encoded USHAY files, fragmented documents, and scattered JSON files-hinders effective analysis. To address this, our pipeline automates decoding, file reconstruction, and dataset consolidation, thereby ensuring that the data is ready for analysis. Subsequently, we evaluated, the pipeline on Reverse Electronic Auction (REA) documents, using TF-IDF and cosine similarity to detect patterns among high-risk procurement processes. The analysis specifically focused on technical specifications, bid documents, and stakeholder Question and Answer (Q&A) records. Overall, the pipeline offers a generalizable, modular framework that shifts focus from preprocessing to analysis. In addition, it supports integration with advanced Natural Language Processing (NLP) models, making it a valuable tool for corruption detection and public procurement oversight.

Original languageEnglish
Title of host publicationETCM 2025 - 9th Ecuador Technical Chapters Meeting
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798331552640
DOIs
StatePublished - 2025
Event9th Ecuador Technical Chapters Meeting, ETCM 2025 - Quito, Ecuador
Duration: 21 Oct 202524 Oct 2025

Publication series

NameETCM 2025 - 9th Ecuador Technical Chapters Meeting

Conference

Conference9th Ecuador Technical Chapters Meeting, ETCM 2025
Country/TerritoryEcuador
CityQuito
Period21/10/2524/10/25

Keywords

  • automated workflows
  • big data
  • corruption risk detection
  • data pipeline
  • Ecuador
  • Kapak project
  • Natural Language Processing (NLP)
  • public accountability
  • Public procurement
  • Reverse Electronic Auction (REA)
  • SOCE
  • transparency

Fingerprint

Dive into the research topics of 'Building a Generalized Framework for Analyzing Public Procurement Data from the Kapak Database'. Together they form a unique fingerprint.

Cite this