Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:VLM-AD: End-to-End Autonomous Driving through Vision-Language Model Supervision

Dec 19, 2024

Yi Xu, Yuxin Hu, Zaiwei Zhang, Gregory P. Meyer, Siva Karthik Mustikovela, Siddhartha Srinivasa, Eric M. Wolff, Xin Huang

Figure 1 for VLM-AD: End-to-End Autonomous Driving through Vision-Language Model Supervision

Figure 2 for VLM-AD: End-to-End Autonomous Driving through Vision-Language Model Supervision

Figure 3 for VLM-AD: End-to-End Autonomous Driving through Vision-Language Model Supervision

Figure 4 for VLM-AD: End-to-End Autonomous Driving through Vision-Language Model Supervision

Share this with someone who'll enjoy it:

Abstract:Human drivers rely on commonsense reasoning to navigate diverse and dynamic real-world scenarios. Existing end-to-end (E2E) autonomous driving (AD) models are typically optimized to mimic driving patterns observed in data, without capturing the underlying reasoning processes. This limitation constrains their ability to handle challenging driving scenarios. To close this gap, we propose VLM-AD, a method that leverages vision-language models (VLMs) as teachers to enhance training by providing additional supervision that incorporates unstructured reasoning information and structured action labels. Such supervision enhances the model's ability to learn richer feature representations that capture the rationale behind driving patterns. Importantly, our method does not require a VLM during inference, making it practical for real-time deployment. When integrated with state-of-the-art methods, VLM-AD achieves significant improvements in planning accuracy and reduced collision rates on the nuScenes dataset.

View paper on

Share this with someone who'll enjoy it:

Title:VLM-AD: End-to-End Autonomous Driving through Vision-Language Model Supervision

Paper and Code