Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

Feb 25, 2024

Xiaoyu Tian, Junru Gu, Bailin Li, Yicheng Liu, Chenxu Hu, Yang Wang, Kun Zhan, Peng Jia, Xianpeng Lang, Hang Zhao

Figure 1 for DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

Figure 2 for DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

Figure 3 for DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

Figure 4 for DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

Share this with someone who'll enjoy it:

Abstract:A primary hurdle of autonomous driving in urban environments is understanding complex and long-tail scenarios, such as challenging road conditions and delicate human behaviors. We introduce DriveVLM, an autonomous driving system leveraging Vision-Language Models (VLMs) for enhanced scene understanding and planning capabilities. DriveVLM integrates a unique combination of chain-of-thought (CoT) modules for scene description, scene analysis, and hierarchical planning. Furthermore, recognizing the limitations of VLMs in spatial reasoning and heavy computational requirements, we propose DriveVLM-Dual, a hybrid system that synergizes the strengths of DriveVLM with the traditional autonomous driving pipeline. DriveVLM-Dual achieves robust spatial understanding and real-time inference speed. Extensive experiments on both the nuScenes dataset and our SUP-AD dataset demonstrate the effectiveness of DriveVLM and the enhanced performance of DriveVLM-Dual, surpassing existing methods in complex and unpredictable driving conditions.

* Project Page: https://tsinghua-mars-lab.github.io/DriveVLM/

View paper on

Share this with someone who'll enjoy it:

Title:DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

Paper and Code