Abstract:Collaborative Perception (CP) is a process in which an ego agent receives and fuses sensor information from surrounding vehicles and infrastructure to enhance its perception capability. To evaluate the need for infrastructure equipped with sensors, extensive and quantitative analysis of the role of infrastructure data in CP is crucial, yet remains underexplored. To address this gap, we first quantitatively assess the importance of infrastructure data in existing vehicle-centric CP, where the ego agent is a vehicle. Furthermore, we compare vehicle-centric CP with infra-centric CP, where the ego agent is now the infrastructure, to evaluate the effectiveness of each approach. Our results demonstrate that incorporating infrastructure data improves 3D detection accuracy by up to 10.87%, and infra-centric CP shows enhanced noise robustness and increases accuracy by up to 42.53% compared with vehicle-centric CP.
Abstract:In this paper, we explore the application of the Decision Transformer, a decision-making algorithm based on the Generative Pre-trained Transformer (GPT) architecture, to multi-vehicle coordination at unsignalized intersections. We formulate the coordination problem so as to find the optimal trajectories for multiple vehicles at intersections, modeling it as a sequence prediction task to fully leverage the power of GPTs as a sequence model. Through extensive experiments, we compare our approach to a reservation-based intersection management system. Our results show that the Decision Transformer can outperform the training data in terms of total travel time and can be generalized effectively to various scenarios, including noise-induced velocity variations, continuous interaction environments, and different vehicle numbers and road configurations.
Abstract:In this paper, we investigate improving the perception performance of autonomous vehicles through communication with other vehicles and road infrastructures. To this end, we introduce a collaborative perception model $\textbf{V2X-M2C}$, consisting of multiple modules, each generating inter-agent complementary information, spatial global context, and spatial local information. Inspired by the question of why most existing architectures are sequential, we analyze both the $\textit{sequential}$ and $\textit{parallel}$ connections of the modules. The sequential connection synergizes the modules, whereas the parallel connection independently improves each module. Extensive experiments demonstrate that V2X-M2C achieves state-of-the-art perception performance, increasing the detection accuracy by 8.00% to 10.87% and decreasing the FLOPs by 42.81% to 52.64%.