Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yinjia Yi

AccidentBlip2: Accident Detection With Multi-View MotionBlip2

Apr 19, 2024

Yihua Shao, Hongyi Cai, Xinwei Long, Weiyi Lang, Zhe Wang, Haoran Wu, Yan Wang, Yinjia Yi, Yang Yang, Zhen Lei

Figure 1 for AccidentBlip2: Accident Detection With Multi-View MotionBlip2

Figure 2 for AccidentBlip2: Accident Detection With Multi-View MotionBlip2

Figure 3 for AccidentBlip2: Accident Detection With Multi-View MotionBlip2

Figure 4 for AccidentBlip2: Accident Detection With Multi-View MotionBlip2

Abstract:Multimodal Large Language Models (MLLMs) have shown outstanding capabilities in many areas of multimodal reasoning. Therefore, we use the reasoning ability of Multimodal Large Language Models for environment description and scene understanding in complex transportation environments. In this paper, we propose AccidentBlip2, a multimodal large language model that can predict in real time whether an accident risk will occur. Our approach involves feature extraction based on the temporal scene of the six-view surround view graphs and temporal inference using the temporal blip framework through the vision transformer. We then input the generated temporal token into the MLLMs for inference to determine whether an accident will occur or not. Since AccidentBlip2 does not rely on any BEV images and LiDAR, the number of inference parameters and the inference cost of MLLMs can be significantly reduced, and it also does not incur a large training overhead during training. AccidentBlip2 outperforms existing solutions on the DeepAccident dataset and can also provide a reference solution for end-to-end automated driving accident prediction.

Via

Access Paper or Ask Questions