TY - GEN
T1 - Semi-Supervised Learning for Volcanic Seismic Event Classification
T2 - 9th Ecuador Technical Chapters Meeting, ETCM 2025
AU - Estrella, Pavel
AU - Grijalva, Felipe
AU - Benitez, Diego S.
AU - Vega-Sanchez, Jose
AU - Perez-Perez, Noel
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - This paper presents a comprehensive analysis of the impact of the proportion of labeled data on the performance of semi-supervised learning models for the classification of volcanic micro-seismic events. This work utilizes a dataset of over 22,000 micro-earthquake records from the Cotopaxi Volcano, with extracted features from time, frequency, and scale domains. Two semi-supervised approaches are implemented: Self-Training, using Support Vector Machine (SVM), Random Forest (RF), and Naive Bayes (NB) as base classifiers; and L abel Spreading, based on graph-based label propagation. Model performance is evaluated using two primary metrics: the Area Under the Receiver Operating Characteristic Curve (AUC-ROC) and F1-score. The SVM Self-Training model achieved the highest results, with an AUC of 0.9575 and an F1-score of 0.9472 when trained with 90% of labeled data. The RF model also performed robustly, particularly in noisy or imbalanced scenarios, reaching an AUC of 0.9505 and an F1-score of 0.9436. In contrast, NB showed limited gains, and the Label Spreading model failed to improve with more labeled data, stabilizing at an AUC around 0.88. These findings highlight the effectiveness of SVM and RF in leveraging unlabeled data for seismic event classification under varying label scarcity conditions.
AB - This paper presents a comprehensive analysis of the impact of the proportion of labeled data on the performance of semi-supervised learning models for the classification of volcanic micro-seismic events. This work utilizes a dataset of over 22,000 micro-earthquake records from the Cotopaxi Volcano, with extracted features from time, frequency, and scale domains. Two semi-supervised approaches are implemented: Self-Training, using Support Vector Machine (SVM), Random Forest (RF), and Naive Bayes (NB) as base classifiers; and L abel Spreading, based on graph-based label propagation. Model performance is evaluated using two primary metrics: the Area Under the Receiver Operating Characteristic Curve (AUC-ROC) and F1-score. The SVM Self-Training model achieved the highest results, with an AUC of 0.9575 and an F1-score of 0.9472 when trained with 90% of labeled data. The RF model also performed robustly, particularly in noisy or imbalanced scenarios, reaching an AUC of 0.9505 and an F1-score of 0.9436. In contrast, NB showed limited gains, and the Label Spreading model failed to improve with more labeled data, stabilizing at an AUC around 0.88. These findings highlight the effectiveness of SVM and RF in leveraging unlabeled data for seismic event classification under varying label scarcity conditions.
KW - Microseisms
KW - Seismic analysis
KW - Semi-supervised classification
UR - https://www.scopus.com/pages/publications/105032516946
U2 - 10.1109/ETCM67548.2025.11304433
DO - 10.1109/ETCM67548.2025.11304433
M3 - Contribución a la conferencia
AN - SCOPUS:105032516946
T3 - ETCM 2025 - 9th Ecuador Technical Chapters Meeting
BT - ETCM 2025 - 9th Ecuador Technical Chapters Meeting
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 21 October 2025 through 24 October 2025
ER -