TY - GEN
T1 - Multiclass Prediction of Bug Resolution Time Using Context-Aware Machine Learning Models
AU - Vaca, Maritza
AU - Flores-Moyano, Ricardo
AU - Baldeon-Calisto, Maria
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Accurately predicting bug resolution time is a persistent challenge in software development and is often complicated by subjective factors and noisy data. This work proposes a robust machine learning pipeline that classifies bug resolution time into four ordinal categories (Immediate, Fast, Normal, and Long) by leveraging project metadata from large-scale repositories, such as GitHub and Jira. The methodology includes comprehensive exploratory data analysis, statistical outlier removal, and advanced feature engineering to derive contextual variables that encapsulate historical projects and component-level behavior. Three machine learning classifiers (Random Forest, XGBoost, and MLPClassifier) were evaluated under various data balancing strategies. While tree-based models showed signs of overfitting, the MLP Classifier achieved superior generalization when enhanced with engineered features, which was validated through cross-validated learning curves and hyperparameter optimization. Our findings underscore the importance of contextual feature design and advance state-of-the-art bug resolution modeling by integrating multiclass classification, temporal context, and model interpretability.
AB - Accurately predicting bug resolution time is a persistent challenge in software development and is often complicated by subjective factors and noisy data. This work proposes a robust machine learning pipeline that classifies bug resolution time into four ordinal categories (Immediate, Fast, Normal, and Long) by leveraging project metadata from large-scale repositories, such as GitHub and Jira. The methodology includes comprehensive exploratory data analysis, statistical outlier removal, and advanced feature engineering to derive contextual variables that encapsulate historical projects and component-level behavior. Three machine learning classifiers (Random Forest, XGBoost, and MLPClassifier) were evaluated under various data balancing strategies. While tree-based models showed signs of overfitting, the MLP Classifier achieved superior generalization when enhanced with engineered features, which was validated through cross-validated learning curves and hyperparameter optimization. Our findings underscore the importance of contextual feature design and advance state-of-the-art bug resolution modeling by integrating multiclass classification, temporal context, and model interpretability.
KW - bug triage
KW - class imbalance
KW - feature engineering
KW - machine learning
KW - Software maintenance
UR - https://www.scopus.com/pages/publications/105032513297
U2 - 10.1109/ETCM67548.2025.11304484
DO - 10.1109/ETCM67548.2025.11304484
M3 - Contribución a la conferencia
AN - SCOPUS:105032513297
T3 - ETCM 2025 - 9th Ecuador Technical Chapters Meeting
BT - ETCM 2025 - 9th Ecuador Technical Chapters Meeting
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 9th Ecuador Technical Chapters Meeting, ETCM 2025
Y2 - 21 October 2025 through 24 October 2025
ER -