TY - GEN
T1 - A Comparative Analysis of Vision Transformers and Convolutional Neural Networks in Cardiac Image Segmentation
AU - Granizo, Sebastion
AU - Baldeon-Calisto, Maria
AU - Iniguez, Milena
AU - Navarrete, Danny
AU - Riofrio, Daniel
AU - Perez-Perez, Noel
AU - Benitez, Diego
AU - Flores-Moyano, Ricardo
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - In recent years, Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have emerged as dominant automated cardiac image segmentation methods. CNNs are efficient architectures that capture local spatial patterns, whereas ViTs can model long-range global dependencies. Each network has been shown to provide better performance on certain types of tasks and datasets. In this work, we conducted a comparative analysis between ViTs and CNNs in the context of cardiac image segmentation. We statistically evaluated the performance of five CNNs and ViTs architectures using the publicly available Automated Cardiac Diagnosis Challenge (ACDC) MRI dataset. Employing a one-way ANOVA and Tukey is test, our analysis indicates that CNNs exhibit superior performance compared to Transformers in segmenting the right ventricle cavity, the left ventricle cavity, and the left ventricle myocardium. Furthermore, CNN architectures tend to be smaller and easier to train. Among all the networks considered, LinkN et achieves the highest performance with a mean dice of 0.8965 and a mean ASSD of 0.2960.
AB - In recent years, Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have emerged as dominant automated cardiac image segmentation methods. CNNs are efficient architectures that capture local spatial patterns, whereas ViTs can model long-range global dependencies. Each network has been shown to provide better performance on certain types of tasks and datasets. In this work, we conducted a comparative analysis between ViTs and CNNs in the context of cardiac image segmentation. We statistically evaluated the performance of five CNNs and ViTs architectures using the publicly available Automated Cardiac Diagnosis Challenge (ACDC) MRI dataset. Employing a one-way ANOVA and Tukey is test, our analysis indicates that CNNs exhibit superior performance compared to Transformers in segmenting the right ventricle cavity, the left ventricle cavity, and the left ventricle myocardium. Furthermore, CNN architectures tend to be smaller and easier to train. Among all the networks considered, LinkN et achieves the highest performance with a mean dice of 0.8965 and a mean ASSD of 0.2960.
KW - Cardiac MRI Segmentation
KW - Convolutional Neural Networks (CNNs)
KW - Image Segmentation
KW - Transformers
KW - Vision Transformers (ViT)
UR - http://www.scopus.com/inward/record.url?scp=85194059418&partnerID=8YFLogxK
U2 - 10.1109/ISDFS60797.2024.10527254
DO - 10.1109/ISDFS60797.2024.10527254
M3 - Contribución a la conferencia
AN - SCOPUS:85194059418
T3 - 12th International Symposium on Digital Forensics and Security, ISDFS 2024
BT - 12th International Symposium on Digital Forensics and Security, ISDFS 2024
A2 - Varol, Asaf
A2 - Karabatak, Murat
A2 - Varol, Cihan
A2 - Tuba, Eva
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 12th International Symposium on Digital Forensics and Security, ISDFS 2024
Y2 - 29 April 2024 through 30 April 2024
ER -