TY - GEN
T1 - Distilling Vision Transformers for no-reference Perceptual CT Image Quality Assessment
AU - Baldeon-Calisto, Maria G.
AU - Rivera-Velastegui, Francisco
AU - Lai-Yuen, Susana K.
AU - Riofrío, Daniel
AU - Pérez-Pérez, Noel
AU - Benítez, Diego
AU - Flores-Moyano, Ricardo
N1 - Publisher Copyright:
© 2024 SPIE.
PY - 2024
Y1 - 2024
N2 - Image quality assessment of CT scans is of utmost importance in balancing radiation dose and image quality. Nonetheless, estimating the image quality of CT scans is a highly subjective task that cannot be adequately captured by a single quantitative metric. In this work, we present a novel vision Transformer network for no-reference CT image quality assessment. Our network combines convolutional operations and multi-head self-attention mechanisms by adding a powerful convolutional stem in the beginning of the traditional ViT network. To enhance the performance and efficiency of the network, we introduce a distillation methodology, comprised of two sequential steps. In Step I, we construct a “teacher ensemble network” by training five Vision Transformer networks using a five-fold division schema. In Step II, we train a single vision Transformer, referred to as the “student network”, by using the teacher’s predictions as new labels. The student network is also optimized using the original labeled dataset. The effectiveness of the proposed model is evaluated on the task of predicting image quality scores from low-dose abdominal CT images from the LDCTIQAC2023 Grand Challenge. Our model demonstrates remarkable performance, ranking 6th during the testing phase of the challenge. Additionally, our experiments highlight the effectiveness of incorporating a convolutional stem in the ViT architecture and the distillation methodology.
AB - Image quality assessment of CT scans is of utmost importance in balancing radiation dose and image quality. Nonetheless, estimating the image quality of CT scans is a highly subjective task that cannot be adequately captured by a single quantitative metric. In this work, we present a novel vision Transformer network for no-reference CT image quality assessment. Our network combines convolutional operations and multi-head self-attention mechanisms by adding a powerful convolutional stem in the beginning of the traditional ViT network. To enhance the performance and efficiency of the network, we introduce a distillation methodology, comprised of two sequential steps. In Step I, we construct a “teacher ensemble network” by training five Vision Transformer networks using a five-fold division schema. In Step II, we train a single vision Transformer, referred to as the “student network”, by using the teacher’s predictions as new labels. The student network is also optimized using the original labeled dataset. The effectiveness of the proposed model is evaluated on the task of predicting image quality scores from low-dose abdominal CT images from the LDCTIQAC2023 Grand Challenge. Our model demonstrates remarkable performance, ranking 6th during the testing phase of the challenge. Additionally, our experiments highlight the effectiveness of incorporating a convolutional stem in the ViT architecture and the distillation methodology.
KW - Image Quality Assessment
KW - Low-dose Computed Tomography
KW - Medical Image Classification
KW - Transformer model distillation
KW - Vision Transformers
UR - http://www.scopus.com/inward/record.url?scp=85193512732&partnerID=8YFLogxK
U2 - 10.1117/12.3004838
DO - 10.1117/12.3004838
M3 - Contribución a la conferencia
AN - SCOPUS:85193512732
T3 - Progress in Biomedical Optics and Imaging - Proceedings of SPIE
BT - Medical Imaging 2024
A2 - Colliot, Olivier
A2 - Mitra, Jhimli
PB - SPIE
T2 - Medical Imaging 2024: Image Processing
Y2 - 19 February 2024 through 22 February 2024
ER -