Joseph Ryan (2114730) Joseph Ryan

A study of Tetris and reinforcement learning

Project Abstract

Tetris is one of the most popular and easy to pick up games of all time, so it serves as an excellent platform to test ai models. However, despite its simple exterior it has proved to be impenetrable for reinforcement learning to crack. The aim of this project was to use PPO, the gold standard RL algorithm to train a model to play the game, as well as to be a delve into what makes Tetris so difficult for RL models to learn to play. In this study I used PPO alongside a convolutional neural network to process images of an instance of the game, with the aim of learning to play it. Throughout experimentation I employed several techniques to try and aid the model, including customized reward algorithms, comparing the use of single move and final state only action spaces, and experimenting with combining heuristics and reinforcement learning approaches. The main findings of the experiment were reinforcement models?�� inability to learn from deeply delayed stochastic rewards. While in theory reinforcement learning could find optimizations on already competent models, when training from scratch, the initial hurdle of having no idea how to reliably achieve rewards proved to be too large to overcome. Even when provided with optimal heuristics, as rewards would always be negative due to the models lack a concept of a good move, they would instead learn to mitigate losses and get trapped In local optima, rather than learn to eliminate them entirely. Ultimately, despite how capable they may be there are still simple seeming problems that they are unable to overcome.

Keywords: Reinforcement Learning, Deep Neural Networks, Computer Vision

 

 Conference Details

 

Session: Poster Session B at Poster Stand 85

Location: Sir Stanley Clarke Auditorium at Wednesday 8th 09:00 – 12:30

Markers: Yuanbo Wu, Nicholas Micallef

Course: MSci Computer Science, 3rd Year

Future Plans: I’m continuing studies