Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

Oct 23, 2024

Max Wilcoxson, Qiyang Li, Kevin Frans, Sergey Levine

Figure 1 for Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

Figure 2 for Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

Figure 3 for Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

Figure 4 for Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

Share this with someone who'll enjoy it:

Abstract:Unsupervised pretraining has been transformative in many supervised domains. However, applying such ideas to reinforcement learning (RL) presents a unique challenge in that fine-tuning does not involve mimicking task-specific data, but rather exploring and locating the solution through iterative self-improvement. In this work, we study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies. While prior data can be used to pretrain a set of low-level skills, or as additional off-policy data for online RL, it has been unclear how to combine these ideas effectively for online exploration. Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits. Our method first extracts low-level skills using a variational autoencoder (VAE), and then pseudo-relabels unlabeled trajectories using an optimistic reward model, transforming prior data into high-level, task-relevant examples. Finally, SUPE uses these transformed examples as additional off-policy data for online RL to learn a high-level policy that composes pretrained low-level skills to explore efficiently. We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks. Code: https://github.com/rail-berkeley/supe.

* 23 pages, 10 figures

View paper on

Share this with someone who'll enjoy it:

Title:Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration

Paper and Code