Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Issei Yamamoto

CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving

Aug 19, 2024

Hidehisa Arai, Keita Miwa, Kento Sasaki, Yu Yamaguchi, Kohei Watanabe, Shunsuke Aoki, Issei Yamamoto

Figure 1 for CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving

Figure 2 for CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving

Figure 3 for CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving

Figure 4 for CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving

Abstract:Autonomous driving, particularly navigating complex and unanticipated scenarios, demands sophisticated reasoning and planning capabilities. While Multi-modal Large Language Models (MLLMs) offer a promising avenue for this, their use has been largely confined to understanding complex environmental contexts or generating high-level driving commands, with few studies extending their application to end-to-end path planning. A major research bottleneck is the lack of large-scale annotated datasets encompassing vision, language, and action. To address this issue, we propose CoVLA (Comprehensive Vision-Language-Action) Dataset, an extensive dataset comprising real-world driving videos spanning more than 80 hours. This dataset leverages a novel, scalable approach based on automated data processing and a caption generation pipeline to generate accurate driving trajectories paired with detailed natural language descriptions of driving environments and maneuvers. This approach utilizes raw in-vehicle sensor data, allowing it to surpass existing datasets in scale and annotation richness. Using CoVLA, we investigate the driving capabilities of MLLMs that can handle vision, language, and action in a variety of driving scenarios. Our results illustrate the strong proficiency of our model in generating coherent language and action outputs, emphasizing the potential of Vision-Language-Action (VLA) models in the field of autonomous driving. This dataset establishes a framework for robust, interpretable, and data-driven autonomous driving systems by providing a comprehensive platform for training and evaluating VLA models, contributing to safer and more reliable self-driving vehicles. The dataset is released for academic purpose.

Via

Access Paper or Ask Questions

SuperDriverAI: Towards Design and Implementation for End-to-End Learning-based Autonomous Driving

May 14, 2023

Shunsuke Aoki, Issei Yamamoto, Daiki Shiotsuka, Yuichi Inoue, Kento Tokuhiro, Keita Miwa

Figure 1 for SuperDriverAI: Towards Design and Implementation for End-to-End Learning-based Autonomous Driving

Figure 2 for SuperDriverAI: Towards Design and Implementation for End-to-End Learning-based Autonomous Driving

Figure 3 for SuperDriverAI: Towards Design and Implementation for End-to-End Learning-based Autonomous Driving

Figure 4 for SuperDriverAI: Towards Design and Implementation for End-to-End Learning-based Autonomous Driving

Abstract:Fully autonomous driving has been widely studied and is becoming increasingly feasible. However, such autonomous driving has yet to be achieved on public roads, because of various uncertainties due to surrounding human drivers and pedestrians. In this paper, we present an end-to-end learningbased autonomous driving system named SuperDriver AI, where Deep Neural Networks (DNNs) learn the driving actions and policies from the experienced human drivers and determine the driving maneuvers to take while guaranteeing road safety. In addition, to improve robustness and interpretability, we present a slit model and a visual attention module. We build a datacollection system and emulator with real-world hardware, and we also test the SuperDriver AI system with real-world driving scenarios. Finally, we have collected 150 runs for one driving scenario in Tokyo, Japan, and have shown the demonstration of SuperDriver AI with the real-world vehicle.

Via

Access Paper or Ask Questions