Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving

Oct 29, 2024

Bo Jiang, Shaoyu Chen, Bencheng Liao, Xingyu Zhang, Wei Yin, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wang

Figure 1 for Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving

Figure 2 for Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving

Figure 3 for Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving

Figure 4 for Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving

Share this with someone who'll enjoy it:

Abstract:End-to-end autonomous driving demonstrates strong planning capabilities with large-scale data but still struggles in complex, rare scenarios due to limited commonsense. In contrast, Large Vision-Language Models (LVLMs) excel in scene understanding and reasoning. The path forward lies in merging the strengths of both approaches. Previous methods using LVLMs to predict trajectories or control signals yield suboptimal results, as LVLMs are not well-suited for precise numerical predictions. This paper presents Senna, an autonomous driving system combining an LVLM (Senna-VLM) with an end-to-end model (Senna-E2E). Senna decouples high-level planning from low-level trajectory prediction. Senna-VLM generates planning decisions in natural language, while Senna-E2E predicts precise trajectories. Senna-VLM utilizes a multi-image encoding approach and multi-view prompts for efficient scene understanding. Besides, we introduce planning-oriented QAs alongside a three-stage training strategy, which enhances Senna-VLM's planning performance while preserving commonsense. Extensive experiments on two datasets show that Senna achieves state-of-the-art planning performance. Notably, with pre-training on a large-scale dataset DriveX and fine-tuning on nuScenes, Senna significantly reduces average planning error by 27.12% and collision rate by 33.33% over model without pre-training. We believe Senna's cross-scenario generalization and transferability are essential for achieving fully autonomous driving. Code and models will be released at https://github.com/hustvl/Senna.

* Project Page: https://github.com/hustvl/Senna

View paper on

Share this with someone who'll enjoy it:

Title:Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving

Paper and Code