TY - GEN
T1 - Unlocking Student Success
T2 - 13th International Symposium on Digital Forensics and Security, ISDFS 2025
AU - Perez, Margorie
AU - Navarrete, Danny
AU - Baldeon-Calisto, Maria
AU - Guerrero, Yuvinne
AU - Sarmiento, Andre
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Student dropout remains a significant challenge for Higher Education Institutions (HEIs), affecting academic planning and student success. This study applies traditional machine learning and deep learning models to predict student dropout in an Ecuadorian HEI using the Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology. A comprehensive analysis of demographic, academic, and economic factors was conducted to develop an effective predictive framework. The evaluated models include Logistic Regression, Support Vector Machine, Random Forest, XGBoost, Feedforward Neural Network, and TabNet. Various configurations were tested, including the application of Principal Component Analysis (PCA) for dimensionality reduction, and the Synthetic Minority Over-sampling Technique (SMOTE) to address class imbalance. Experimental results reveal that PCA and SMOTE are unnecessary. Among the models, Random Forest achieved the highest performance with a 96.62% accuracy, a ROC-AUC of 0.92, and an F1-score of 0.94. Feature importance analysis identified cumulative GP A and the number of semesters completed as the most influential factors for student dropout, followed by failed courses, high school grades, and entrance exam scores. This study emphasizes the importance of model interpretability, allowing HEIs to translate predictive insights into actionable strategies. By informing student retention policies and optimizing recruitment processes, this research contributes to data-driven decision-making in higher education.
AB - Student dropout remains a significant challenge for Higher Education Institutions (HEIs), affecting academic planning and student success. This study applies traditional machine learning and deep learning models to predict student dropout in an Ecuadorian HEI using the Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology. A comprehensive analysis of demographic, academic, and economic factors was conducted to develop an effective predictive framework. The evaluated models include Logistic Regression, Support Vector Machine, Random Forest, XGBoost, Feedforward Neural Network, and TabNet. Various configurations were tested, including the application of Principal Component Analysis (PCA) for dimensionality reduction, and the Synthetic Minority Over-sampling Technique (SMOTE) to address class imbalance. Experimental results reveal that PCA and SMOTE are unnecessary. Among the models, Random Forest achieved the highest performance with a 96.62% accuracy, a ROC-AUC of 0.92, and an F1-score of 0.94. Feature importance analysis identified cumulative GP A and the number of semesters completed as the most influential factors for student dropout, followed by failed courses, high school grades, and entrance exam scores. This study emphasizes the importance of model interpretability, allowing HEIs to translate predictive insights into actionable strategies. By informing student retention policies and optimizing recruitment processes, this research contributes to data-driven decision-making in higher education.
KW - Deep Learning
KW - Dropout Prediction in Higher Education
KW - Higher Education Institutions
KW - Machine Learning
KW - NeuralNetworks
KW - TabNet
UR - http://www.scopus.com/inward/record.url?scp=105008496899&partnerID=8YFLogxK
U2 - 10.1109/ISDFS65363.2025.11012013
DO - 10.1109/ISDFS65363.2025.11012013
M3 - Contribución a la conferencia
AN - SCOPUS:105008496899
T3 - ISDFS 2025 - 13th International Symposium on Digital Forensics and Security
BT - ISDFS 2025 - 13th International Symposium on Digital Forensics and Security
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 24 April 2025 through 25 April 2025
ER -