Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:PWM: Policy Learning with Large World Models

Jul 02, 2024

Ignat Georgiev, Varun Giridhar, Nicklas Hansen, Animesh Garg

Figure 1 for PWM: Policy Learning with Large World Models

Figure 2 for PWM: Policy Learning with Large World Models

Figure 3 for PWM: Policy Learning with Large World Models

Figure 4 for PWM: Policy Learning with Large World Models

Share this with someone who'll enjoy it:

Abstract:Reinforcement Learning (RL) has achieved impressive results on complex tasks but struggles in multi-task settings with different embodiments. World models offer scalability by learning a simulation of the environment, yet they often rely on inefficient gradient-free optimization methods. We introduce Policy learning with large World Models (PWM), a novel model-based RL algorithm that learns continuous control policies from large multi-task world models. By pre-training the world model on offline data and using it for first-order gradient policy learning, PWM effectively solves tasks with up to 152 action dimensions and outperforms methods using ground-truth dynamics. Additionally, PWM scales to an 80-task setting, achieving up to 27% higher rewards than existing baselines without the need for expensive online planning. Visualizations and code available at https://policy-world-model.github.io

* Visualizations and code available at https://policy-world-model.github.io

View paper on

Share this with someone who'll enjoy it:

Title:PWM: Policy Learning with Large World Models

Paper and Code