Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences

Jun 05, 2024

Yidong Huang, Jacob Sansom, Ziqiao Ma, Felix Gervits, Joyce Chai

Figure 1 for DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences

Figure 2 for DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences

Figure 3 for DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences

Figure 4 for DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences

Share this with someone who'll enjoy it:

Abstract:Recent advancements in foundation models (FMs) have unlocked new prospects in autonomous driving, yet the experimental settings of these studies are preliminary, over-simplified, and fail to capture the complexity of real-world driving scenarios in human environments. It remains under-explored whether FM agents can handle long-horizon navigation tasks with free-from dialogue and deal with unexpected situations caused by environmental dynamics or task changes. To explore the capabilities and boundaries of FMs faced with the challenges above, we introduce DriVLMe, a video-language-model-based agent to facilitate natural and effective communication between humans and autonomous vehicles that perceive the environment and navigate. We develop DriVLMe from both embodied experiences in a simulated environment and social experiences from real human dialogue. While DriVLMe demonstrates competitive performance in both open-loop benchmarks and closed-loop human studies, we reveal several limitations and challenges, including unacceptable inference time, imbalanced training data, limited visual understanding, challenges with multi-turn interactions, simplified language generation from robotic experiences, and difficulties in handling on-the-fly unexpected situations like environmental dynamics and task changes.

* First Vision and Language for Autonomous Driving and Robotics Workshop (VLADR @ CVPR 2024)

View paper on

Share this with someone who'll enjoy it:

Title:DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences

Paper and Code