Hyperparameter Tuning over an Attention Model for Image Captioning

Roberto Castro, Israel Pineda, Manuel Eugenio Morocho-Cayamcela

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

2 Citas (Scopus)


Considering the historical trajectory and evolution of image captioning as a research area, this paper focuses on visual attention as an approach to solve captioning tasks with computer vision. This article studies the efficiency of different hyperparameter configurations on a state-of-the-art visual attention architecture composed of a pre-trained residual neural network encoder, and a long short-term memory decoder. Results show that the selection of both the cost function and the gradient-based optimizer have a significant impact on the captioning results. Our system considers the cross-entropy, Kullback-Leibler divergence, mean squared error, and the negative log-likelihood loss functions, as well as the adaptive momentum, AdamW, RMSprop, stochastic gradient descent, and Adadelta optimizers. Based on the performance metrics, a combination of cross-entropy with Adam is identified as the best alternative returning a Top-5 accuracy value of 73.092, and a BLEU-4 value of 0.201. Setting the cross-entropy as an independent variable, the first two optimization alternatives prove the best performance with a BLEU-4 metric value of 0.201. In terms of the inference loss, Adam outperforms AdamW with 3.413 over 3.418 and a Top-5 accuracy of 73.092 over 72.989.

Idioma originalInglés
Título de la publicación alojadaInformation and Communication Technologies - 9th Conference of Ecuador, TICEC 2021, Proceedings
EditoresJuan Pablo Salgado Guerrero, Janneth Chicaiza Espinosa, Mariela Cerrada Lozada, Santiago Berrezueta-Guzman
EditorialSpringer Science and Business Media Deutschland GmbH
Número de páginas12
ISBN (versión impresa)9783030899400
EstadoPublicada - 2021
Publicado de forma externa
Evento9th Conference on Information and Communication Technologies of Ecuador, TICEC 2021 - Virtual, Online
Duración: 24 nov. 202126 nov. 2021

Serie de la publicación

NombreCommunications in Computer and Information Science
Volumen1456 CCIS
ISSN (versión impresa)1865-0929
ISSN (versión digital)1865-0937


Conferencia9th Conference on Information and Communication Technologies of Ecuador, TICEC 2021
CiudadVirtual, Online


Profundice en los temas de investigación de 'Hyperparameter Tuning over an Attention Model for Image Captioning'. En conjunto forman una huella única.

Citar esto