Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space

Jul 08, 2024

Yumeng Zhang, Shi Gong, Kaixin Xiong, Xiaoqing Ye, Xiao Tan, Fan Wang, Jizhou Huang, Hua Wu, Haifeng Wang

Figure 1 for BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space

Figure 2 for BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space

Figure 3 for BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space

Figure 4 for BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space

Share this with someone who'll enjoy it:

Abstract:World models are receiving increasing attention in autonomous driving for their ability to predict potential future scenarios. In this paper, we present BEVWorld, a novel approach that tokenizes multimodal sensor inputs into a unified and compact Bird's Eye View (BEV) latent space for environment modeling. The world model consists of two parts: the multi-modal tokenizer and the latent BEV sequence diffusion model. The multi-modal tokenizer first encodes multi-modality information and the decoder is able to reconstruct the latent BEV tokens into LiDAR and image observations by ray-casting rendering in a self-supervised manner. Then the latent BEV sequence diffusion model predicts future scenarios given action tokens as conditions. Experiments demonstrate the effectiveness of BEVWorld in autonomous driving tasks, showcasing its capability in generating future scenes and benefiting downstream tasks such as perception and motion prediction. Code will be available at https://github.com/zympsyche/BevWorld.

* 10 pages

View paper on

Share this with someone who'll enjoy it:

Title:BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space

Paper and Code