Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery

Aug 27, 2024

Alexander Rutherford, Michael Beukman, Timon Willi, Bruno Lacerda, Nick Hawes, Jakob Foerster

Figure 1 for No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery

Figure 2 for No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery

Figure 3 for No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery

Figure 4 for No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery

Share this with someone who'll enjoy it:

Abstract:What data or environments to use for training to improve downstream performance is a longstanding and very topical question in reinforcement learning. In particular, Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula enable agents to be robust to in- and out-of-distribution tasks. We ask to what extent these methods are themselves robust when applied to a novel setting, closely inspired by a real-world robotics problem. Surprisingly, we find that the state-of-the-art UED methods either do not improve upon the na\"{i}ve baseline of Domain Randomisation (DR), or require substantial hyperparameter tuning to do so. Our analysis shows that this is due to their underlying scoring functions failing to predict intuitive measures of ``learnability'', i.e., in finding the settings that the agent sometimes solves, but not always. Based on this, we instead directly train on levels with high learnability and find that this simple and intuitive approach outperforms UED methods and DR in several binary-outcome environments, including on our domain and the standard UED domain of Minigrid. We further introduce a new adversarial evaluation procedure for directly measuring robustness, closely mirroring the conditional value at risk (CVaR). We open-source all our code and present visualisations of final policies here: https://github.com/amacrutherford/sampling-for-learnability.

View paper on

Share this with someone who'll enjoy it:

Title:No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery

Paper and Code