TY - GEN
T1 - Hyperparameter Tuning over an Attention Model for Image Captioning
AU - Castro, Roberto
AU - Pineda, Israel
AU - Morocho-Cayamcela, Manuel Eugenio
N1 - Publisher Copyright:
© 2021, Springer Nature Switzerland AG.
PY - 2021
Y1 - 2021
N2 - Considering the historical trajectory and evolution of image captioning as a research area, this paper focuses on visual attention as an approach to solve captioning tasks with computer vision. This article studies the efficiency of different hyperparameter configurations on a state-of-the-art visual attention architecture composed of a pre-trained residual neural network encoder, and a long short-term memory decoder. Results show that the selection of both the cost function and the gradient-based optimizer have a significant impact on the captioning results. Our system considers the cross-entropy, Kullback-Leibler divergence, mean squared error, and the negative log-likelihood loss functions, as well as the adaptive momentum, AdamW, RMSprop, stochastic gradient descent, and Adadelta optimizers. Based on the performance metrics, a combination of cross-entropy with Adam is identified as the best alternative returning a Top-5 accuracy value of 73.092, and a BLEU-4 value of 0.201. Setting the cross-entropy as an independent variable, the first two optimization alternatives prove the best performance with a BLEU-4 metric value of 0.201. In terms of the inference loss, Adam outperforms AdamW with 3.413 over 3.418 and a Top-5 accuracy of 73.092 over 72.989.
AB - Considering the historical trajectory and evolution of image captioning as a research area, this paper focuses on visual attention as an approach to solve captioning tasks with computer vision. This article studies the efficiency of different hyperparameter configurations on a state-of-the-art visual attention architecture composed of a pre-trained residual neural network encoder, and a long short-term memory decoder. Results show that the selection of both the cost function and the gradient-based optimizer have a significant impact on the captioning results. Our system considers the cross-entropy, Kullback-Leibler divergence, mean squared error, and the negative log-likelihood loss functions, as well as the adaptive momentum, AdamW, RMSprop, stochastic gradient descent, and Adadelta optimizers. Based on the performance metrics, a combination of cross-entropy with Adam is identified as the best alternative returning a Top-5 accuracy value of 73.092, and a BLEU-4 value of 0.201. Setting the cross-entropy as an independent variable, the first two optimization alternatives prove the best performance with a BLEU-4 metric value of 0.201. In terms of the inference loss, Adam outperforms AdamW with 3.413 over 3.418 and a Top-5 accuracy of 73.092 over 72.989.
KW - Artificial intelligence
KW - Computer vision
KW - Image captioning
KW - Supervised learning
KW - Visual attention
UR - http://www.scopus.com/inward/record.url?scp=85121588563&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-89941-7_13
DO - 10.1007/978-3-030-89941-7_13
M3 - Contribución a la conferencia
AN - SCOPUS:85121588563
SN - 9783030899400
T3 - Communications in Computer and Information Science
SP - 172
EP - 183
BT - Information and Communication Technologies - 9th Conference of Ecuador, TICEC 2021, Proceedings
A2 - Salgado Guerrero, Juan Pablo
A2 - Chicaiza Espinosa, Janneth
A2 - Cerrada Lozada, Mariela
A2 - Berrezueta-Guzman, Santiago
PB - Springer Science and Business Media Deutschland GmbH
T2 - 9th Conference on Information and Communication Technologies of Ecuador, TICEC 2021
Y2 - 24 November 2021 through 26 November 2021
ER -