Satyajit Rao | Swansea Computer Science Student Conference

Satyajit Rao (2037140)

Efficient Learning of the Optimal Probability Distribution over the Policy Space in Reinforcement Learning

Project Abstract

Over the past decade or so, the field of machine learning has made vast strides forward. These algorithms are used to solve a variety of problems in the world today, and several flavours of algorithms have been created to solve different categories of problems. One search technique is called Reinforcement Learning and is commonly used in scenarios with an unexplored environment or when interacting with the environment in the real world is expensive or risky. RL allows the agent to interact with the environment randomly and is either rewarded or punished based on outcome. The agent will attempt to find optimal strategies. The problem with this technique is the time efficiency given the random trial and error nature of its learning. We aim to develop an entropy based approach to the task of policy search(choosing the optimal strategy from all the strategies the agent has come up with). By estimating the optimal probability distribution over the policy space, we are able to choose which trials to perform that can yield positive results and thereby improving the time efficiency of RL. We will be benchmarking current popular RL algorithms with interesting policy search techniques(like PILCO) in standardised environments(Gym Library). Following this, investigate potential approaches to a desired solution. We hope to be able to provide a generalised entropy-based solution to the task of policy search that is an improvement, in terms of time efficiency, to existing comparative techniques today.

Keywords: Data Science, Reinforcement Learning, Optimisation

Conference Details

Session: Presentation Stream 25 at Presentation Slot 3

Location: GH049 at Wednesday 8th 13:30 – 17:00

Markers: Megan Venn-Wycherley, Fernando Maestre Avila

Course: MSc Data Science, Masters PG

Future Plans: I’m looking for work