Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions

Jun 27, 2023

Nishil Patel, Sebastian Lee, Stefano Sarao Mannelli, Sebastian Goldt, Adrew Saxe

Figure 1 for The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions

Figure 2 for The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions

Figure 3 for The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions

Figure 4 for The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions

Share this with someone who'll enjoy it:

Abstract:Reinforcement learning (RL) algorithms have proven transformative in a range of domains. To tackle real-world domains, these systems often use neural networks to learn policies directly from pixels or other high-dimensional sensory input. By contrast, much theory of RL has focused on discrete state spaces or worst-case analysis, and fundamental questions remain about the dynamics of policy learning in high-dimensional settings. Here, we propose a solvable high-dimensional model of RL that can capture a variety of learning protocols, and derive its typical dynamics as a set of closed-form ordinary differential equations (ODEs). We derive optimal schedules for the learning rates and task difficulty - analogous to annealing schemes and curricula during training in RL - and show that the model exhibits rich behaviour, including delayed learning under sparse rewards; a variety of learning regimes depending on reward baselines; and a speed-accuracy trade-off driven by reward stringency. Experiments on variants of the Procgen game "Bossfight" and Arcade Learning Environment game "Pong" also show such a speed-accuracy trade-off in practice. Together, these results take a step towards closing the gap between theory and practice in high-dimensional RL.

* 10 pages, 7 figures, Preprint

View paper on

Share this with someone who'll enjoy it:

Title:The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions

Paper and Code