An Efficient Deep Q-learning Strategy for Sequential Decision-making in Game-playing

Oscar Chang, Manuel Eugenio Morocho-Cayamcela, Israel Pineda, Kevin Cardenas

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

2 Citas (Scopus)

Resumen

This paper presents a deep reinforcement learning model that efficiently learns a sequential decision-making policy to play tic-tac-toe intelligently directly from a high-dimensional video. To produce a stable, sparse neural representation of the states of the tic-tac-toe board, a convolutional pre-trained neural network has been used, followed by a fully-connected sigmoidal network. The assemble behaves as a Q-matrix and produces the ultimate state-decision pairs that control a robotic arm placing physical tokens on the board. The hyperparameters in the whole network are tuned to produce a stable trainable array of elements. An internal clock composed of internal neurons is integrated to give the agent a sense of sequential timing. To solve the max(⊙) function, a novel algorithm is introduced to search for the Q-network values. The algorithm uses a dedicated, sigmoidal net initialized with random parameters. Under backpropagation it iteratively moves to a stable plateau that mimics the all-zeros condition of an initial Q-matrix. Next, the agent uses Bellman's reinforcement principles to learn an optimal policy with a noticeable look-ahead capability. Computer simulations driving a physical robot proved the convergence and effectiveness of the proposed methodology and demonstrated a marked ability in sequential decision-making, taking raw video frames as input.

Idioma originalInglés
Título de la publicación alojadaProceedings - 3rd International Conference on Information Systems and Software Technologies, ICI2ST 2022
EditorialInstitute of Electrical and Electronics Engineers Inc.
Páginas172-177
Número de páginas6
ISBN (versión digital)9781665455176
DOI
EstadoPublicada - 2022
Evento3rd International Conference on Information Systems and Software Technologies, ICI2ST 2022 - Quito, Ecuador
Duración: 8 nov. 202210 nov. 2022

Serie de la publicación

NombreProceedings - 3rd International Conference on Information Systems and Software Technologies, ICI2ST 2022

Conferencia

Conferencia3rd International Conference on Information Systems and Software Technologies, ICI2ST 2022
País/TerritorioEcuador
CiudadQuito
Período8/11/2210/11/22

Huella

Profundice en los temas de investigación de 'An Efficient Deep Q-learning Strategy for Sequential Decision-making in Game-playing'. En conjunto forman una huella única.

Citar esto