Abstract:Collaborative Perception (CP) is a process in which an ego agent receives and fuses sensor information from surrounding vehicles and infrastructure to enhance its perception capability. To evaluate the need for infrastructure equipped with sensors, extensive and quantitative analysis of the role of infrastructure data in CP is crucial, yet remains underexplored. To address this gap, we first quantitatively assess the importance of infrastructure data in existing vehicle-centric CP, where the ego agent is a vehicle. Furthermore, we compare vehicle-centric CP with infra-centric CP, where the ego agent is now the infrastructure, to evaluate the effectiveness of each approach. Our results demonstrate that incorporating infrastructure data improves 3D detection accuracy by up to 10.87%, and infra-centric CP shows enhanced noise robustness and increases accuracy by up to 42.53% compared with vehicle-centric CP.
Abstract:In this paper, we investigate improving the perception performance of autonomous vehicles through communication with other vehicles and road infrastructures. To this end, we introduce a collaborative perception model $\textbf{V2X-M2C}$, consisting of multiple modules, each generating inter-agent complementary information, spatial global context, and spatial local information. Inspired by the question of why most existing architectures are sequential, we analyze both the $\textit{sequential}$ and $\textit{parallel}$ connections of the modules. The sequential connection synergizes the modules, whereas the parallel connection independently improves each module. Extensive experiments demonstrate that V2X-M2C achieves state-of-the-art perception performance, increasing the detection accuracy by 8.00% to 10.87% and decreasing the FLOPs by 42.81% to 52.64%.