Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Catherine Cang

URLB: Unsupervised Reinforcement Learning Benchmark

Oct 28, 2021

Michael Laskin, Denis Yarats, Hao Liu, Kimin Lee, Albert Zhan, Kevin Lu, Catherine Cang, Lerrel Pinto, Pieter Abbeel

Figure 1 for URLB: Unsupervised Reinforcement Learning Benchmark

Figure 2 for URLB: Unsupervised Reinforcement Learning Benchmark

Figure 3 for URLB: Unsupervised Reinforcement Learning Benchmark

Figure 4 for URLB: Unsupervised Reinforcement Learning Benchmark

Abstract:Deep Reinforcement Learning (RL) has emerged as a powerful paradigm to solve a range of complex yet specific control tasks. Yet training generalist agents that can quickly adapt to new tasks remains an outstanding challenge. Recent advances in unsupervised RL have shown that pre-training RL agents with self-supervised intrinsic rewards can result in efficient adaptation. However, these algorithms have been hard to compare and develop due to the lack of a unified benchmark. To this end, we introduce the Unsupervised Reinforcement Learning Benchmark (URLB). URLB consists of two phases: reward-free pre-training and downstream task adaptation with extrinsic rewards. Building on the DeepMind Control Suite, we provide twelve continuous control tasks from three domains for evaluation and open-source code for eight leading unsupervised RL methods. We find that the implemented baselines make progress but are not able to solve URLB and propose directions for future research.

* Code for the Unsupervised Reinforcement Learning Benchmark is available at https://github.com/rll-research/url_benchmark

Via

Access Paper or Ask Questions

Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL

Jun 18, 2021

Catherine Cang, Aravind Rajeswaran, Pieter Abbeel, Michael Laskin

Figure 1 for Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL

Figure 2 for Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL

Figure 3 for Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL

Figure 4 for Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL

Abstract:Offline Reinforcement Learning (RL) aims to extract near-optimal policies from imperfect offline data without additional environment interactions. Extracting policies from diverse offline datasets has the potential to expand the range of applicability of RL by making the training process safer, faster, and more streamlined. We investigate how to improve the performance of offline RL algorithms, its robustness to the quality of offline data, as well as its generalization capabilities. To this end, we introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE). Our algorithm is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary. When combined together, they substantially improve the performance and generalization of offline RL policies. In the widely studied D4RL offline RL benchmark, we find that MABE achieves higher average performance compared to prior model-free and model-based algorithms. In experiments that require cross-domain generalization, we find that MABE outperforms prior methods. Our website is available at https://sites.google.com/berkeley.edu/mabe .

Via

Access Paper or Ask Questions