TY - GEN
T1 - Scoring with Data
T2 - 2025 6th International Conference on Computers and Artificial Intelligence Technology, CAIT 2025
AU - Orbe, Joaquin
AU - Baldeon-Calisto, Maria
AU - Perez-Perez, Noel
AU - Flores-Moyano, Ricardo
AU - Benitez, Diego
AU - Riofrio, Daniel
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Predicting the outcomes of football matches remains a complex task due to the numerous dynamic and unpredictable factors that can influence the results. This study proposes a datadriven approach to classify English Premier League matches as wins, draws, or losses using machine learning and deep learning techniques applied to data from the last eight seasons. A web scraping tool was developed to systematically collect relevant match statistics and team information. The performance of Random Forest, XGBoost, and TabNet models was evaluated, along with an ensemble model that combines their complementary strengths. Results show that the ensemble model achieves higher predictive accuracy, especially when recent team performance metrics are included. A feature importance analysis highlights variables such as recent form, expected goals, and possession as critical for accurate prediction. Lastly, the ensemble model is benchmarked against external sources, including an AIbased predictor and a professional betting house, providing a comparative assessment of its practical applicability.
AB - Predicting the outcomes of football matches remains a complex task due to the numerous dynamic and unpredictable factors that can influence the results. This study proposes a datadriven approach to classify English Premier League matches as wins, draws, or losses using machine learning and deep learning techniques applied to data from the last eight seasons. A web scraping tool was developed to systematically collect relevant match statistics and team information. The performance of Random Forest, XGBoost, and TabNet models was evaluated, along with an ensemble model that combines their complementary strengths. Results show that the ensemble model achieves higher predictive accuracy, especially when recent team performance metrics are included. A feature importance analysis highlights variables such as recent form, expected goals, and possession as critical for accurate prediction. Lastly, the ensemble model is benchmarked against external sources, including an AIbased predictor and a professional betting house, providing a comparative assessment of its practical applicability.
KW - Machine learning
KW - Random Forest
KW - TabNet
KW - XGBoost
KW - deep learning
KW - football analytics
KW - match results
UR - https://www.scopus.com/pages/publications/105036037056
U2 - 10.1109/CAIT68620.2025.11424868
DO - 10.1109/CAIT68620.2025.11424868
M3 - Contribución a la conferencia
AN - SCOPUS:105036037056
T3 - 2025 6th International Conference on Computers and Artificial Intelligence Technology, CAIT 2025
SP - 69
EP - 74
BT - 2025 6th International Conference on Computers and Artificial Intelligence Technology, CAIT 2025
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 12 December 2025 through 14 December 2025
ER -