Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Self-Play and Self-Describe: Policy Adaptation with Vision-Language Foundation Models

Dec 14, 2022

Yuying Ge, Annabella Macaluso, Li Erran Li, Ping Luo, Xiaolong Wang

Figure 1 for Self-Play and Self-Describe: Policy Adaptation with Vision-Language Foundation Models

Figure 2 for Self-Play and Self-Describe: Policy Adaptation with Vision-Language Foundation Models

Figure 3 for Self-Play and Self-Describe: Policy Adaptation with Vision-Language Foundation Models

Figure 4 for Self-Play and Self-Describe: Policy Adaptation with Vision-Language Foundation Models

Share this with someone who'll enjoy it:

Abstract:Recent progress on vision-language foundation models have brought significant advancement to building general-purpose robots. By using the pre-trained models to encode the scene and instructions as inputs for decision making, the instruction-conditioned policy can generalize across different objects and tasks. While this is encouraging, the policy still fails in most cases given an unseen task or environment. To adapt the policy to unseen tasks and environments, we explore a new paradigm on leveraging the pre-trained foundation models with Self-PLAY and Self-Describe (SPLAYD). When deploying the trained policy to a new task or a new environment, we first let the policy self-play with randomly generated instructions to record the demonstrations. While the execution could be wrong, we can use the pre-trained foundation models to accurately self-describe (i.e., re-label or classify) the demonstrations. This automatically provides new pairs of demonstration-instruction data for policy fine-tuning. We evaluate our method on a broad range of experiments with the focus on generalization on unseen objects, unseen tasks, unseen environments, and sim-to-real transfer. We show SPLAYD improves baselines by a large margin in all cases. Our project page is available at https://geyuying.github.io/SPLAYD/

* Project page: https://geyuying.github.io/SPLAYD/

View paper on

Share this with someone who'll enjoy it:

Title:Self-Play and Self-Describe: Policy Adaptation with Vision-Language Foundation Models

Paper and Code