TY - GEN
T1 - A Comparative Study of Machine Learning Models for Two-Tier Android Malware Classification with Dynamic Behavioral Analysis
AU - Torres, Jorge
AU - Grijalva, Felipe
AU - Chushig-Muzo, David
AU - Curiel, Luis Bote
AU - Loza, Malena
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY - 2026
Y1 - 2026
N2 - The rapid proliferation of android malware has emerged as a critical threat to global cybersecurity. This study comparatively evaluates five supervised classification algorithms, including Random Forest (RF), Support Vector Machines (SVM) with RBF kernel, Artificial Neural Networks (ANNs), Naive Bayes and the novel TabNet model. The CCCS-CIC-AndMal-2020 dataset is used that comprises 200,000 malware samples categorized into 14 classes and 191 families, with features dynamically extracted during application execution in emulated environments. The predictive performance was assessed at two hierarchical classification approaches, distinguishing between broad malware categories and family-level attribution. To address class imbalance, oversampling techniques were considered. Precision, recall, and F1-score metrics, complemented by confusion matrices and ROC curves, were utilized for comprehensive evaluation. Statistical significance of differences among classifiers was determined using Friedman and Nemenyi post-hoc tests. Experimental results showed that RF, SVM, and ANNs consistently outperform other models across most metrics. This research provides a robust analytical framework for developing intelligent malware detection systems, contributing significantly to enhanced mobile cybersecurity.
AB - The rapid proliferation of android malware has emerged as a critical threat to global cybersecurity. This study comparatively evaluates five supervised classification algorithms, including Random Forest (RF), Support Vector Machines (SVM) with RBF kernel, Artificial Neural Networks (ANNs), Naive Bayes and the novel TabNet model. The CCCS-CIC-AndMal-2020 dataset is used that comprises 200,000 malware samples categorized into 14 classes and 191 families, with features dynamically extracted during application execution in emulated environments. The predictive performance was assessed at two hierarchical classification approaches, distinguishing between broad malware categories and family-level attribution. To address class imbalance, oversampling techniques were considered. Precision, recall, and F1-score metrics, complemented by confusion matrices and ROC curves, were utilized for comprehensive evaluation. Statistical significance of differences among classifiers was determined using Friedman and Nemenyi post-hoc tests. Experimental results showed that RF, SVM, and ANNs consistently outperform other models across most metrics. This research provides a robust analytical framework for developing intelligent malware detection systems, contributing significantly to enhanced mobile cybersecurity.
KW - CCCS-CIC-AndMal-2020
KW - Malware
KW - Naive Bayes
KW - Neural Networks
KW - Random Forest
KW - SVM
KW - Supervised classification
KW - TabNet
UR - https://www.scopus.com/pages/publications/105022084436
U2 - 10.1007/978-3-032-10486-1_41
DO - 10.1007/978-3-032-10486-1_41
M3 - Contribución a la conferencia
AN - SCOPUS:105022084436
SN - 9783032104854
T3 - Lecture Notes in Computer Science
SP - 445
EP - 456
BT - Intelligent Data Engineering and Automated Learning, IDEAL 2025 - 26th International Conference, Proceedings
A2 - Martínez, Luis
A2 - Camacho, David
A2 - Yin, Hujun
A2 - Dutta, Bapi
A2 - Yera, Raciel
A2 - Rodríguez Domínguez, Rosa M.
A2 - Tallón-Ballesteros, Antonio
PB - Springer Science and Business Media Deutschland GmbH
T2 - 26th International Conference on Intelligent Data Engineering and Automated Learning, IDEAL 2025
Y2 - 13 November 2025 through 15 November 2025
ER -