Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:UniZero: Generalized and Efficient Planning with Scalable Latent World Models

Jun 15, 2024

Yuan Pu, Yazhe Niu, Jiyuan Ren, Zhenjie Yang, Hongsheng Li, Yu Liu

Figure 1 for UniZero: Generalized and Efficient Planning with Scalable Latent World Models

Figure 2 for UniZero: Generalized and Efficient Planning with Scalable Latent World Models

Figure 3 for UniZero: Generalized and Efficient Planning with Scalable Latent World Models

Figure 4 for UniZero: Generalized and Efficient Planning with Scalable Latent World Models

Share this with someone who'll enjoy it:

Abstract:Learning predictive world models is essential for enhancing the planning capabilities of reinforcement learning agents. Notably, the MuZero-style algorithms, based on the value equivalence principle and Monte Carlo Tree Search (MCTS), have achieved superhuman performance in various domains. However, in environments that require capturing long-term dependencies, MuZero's performance deteriorates rapidly. We identify that this is partially due to the \textit{entanglement} of latent representations with historical information, which results in incompatibility with the auxiliary self-supervised state regularization. To overcome this limitation, we present \textit{UniZero}, a novel approach that \textit{disentangles} latent states from implicit latent history using a transformer-based latent world model. By concurrently predicting latent dynamics and decision-oriented quantities conditioned on the learned latent history, UniZero enables joint optimization of the long-horizon world model and policy, facilitating broader and more efficient planning in latent space. We demonstrate that UniZero, even with single-frame inputs, matches or surpasses the performance of MuZero-style algorithms on the Atari 100k benchmark. Furthermore, it significantly outperforms prior baselines in benchmarks that require long-term memory. Lastly, we validate the effectiveness and scalability of our design choices through extensive ablation studies, visual analyses, and multi-task learning results. The code is available at \textcolor{magenta}{https://github.com/opendilab/LightZero}.

* 32 pages, 16 figures

View paper on

Share this with someone who'll enjoy it:

Title:UniZero: Generalized and Efficient Planning with Scalable Latent World Models

Paper and Code