Thesis defence T.D. de Bruin: reinforcement learning

17 January 2020 10:00 - Location: Aula, TU Delft - By: webredactie

Sample Efficient Deep Reinforcement Learning for Control. Promotor 1: Prof.dr.ir. Prof.dr. R. Babuska (3mE); Promotor 2: Prof.dr. K.P. Tuyls (3mE);

The arrival of intelligent, general-purpose robots that can learn to perform new tasks autonomously has been promised for a long time now. Deep reinforcement learning, which combines reinforcement learning with deep neural network function approximation, has the potential to enable robots to learn to perform a wide range

of new tasks while requiring very little prior knowledge or human help. This framework might therefore help to finally make general purpose robots a reality. However, the biggest successes of deep reinforcement learning have so far been in simulated game settings. To translate these successes to the real world, significant improvements are needed in the ability of these methods to learn quickly and safely. This thesis investigates what is needed to make this possible and makes contributions towards this goal.

Specifically, this thesis:

-          Investigates how to value experiences, such that important ones can be remembered and prioritized. This enables  more stable and efficient learning.

-          Investigates how to learn to represent the state of the world the world through short term objectives such as predicting the immediate effects of actions and compressing sensor data. Compared to only predicting long term effects of behaviors – as is standard in reinforcement learning – this helps learn more general behaviors more quickly.

-          Investigates how deep learning can be combined with evolutionary strategies to quickly learn acceptable behaviors while subsequently improving them in a more stable and predictable manner.  

More information?

For access to theses by the PhD students you can have a look in TU Delft Repository, the digital storage of publications of TU Delft. Theses will be available within a few weeks after the actual thesis defence.