Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhiying Song

Wireless Communication as an Information Sensor for Multi-agent Cooperative Perception: A Survey

Apr 30, 2025

Zhiying Song, Tenghui Xie, Fuxi Wen, Jun Li

Abstract:Cooperative perception extends the perception capabilities of autonomous vehicles by enabling multi-agent information sharing via Vehicle-to-Everything (V2X) communication. Unlike traditional onboard sensors, V2X acts as a dynamic "information sensor" characterized by limited communication, heterogeneity, mobility, and scalability. This survey provides a comprehensive review of recent advancements from the perspective of information-centric cooperative perception, focusing on three key dimensions: information representation, information fusion, and large-scale deployment. We categorize information representation into data-level, feature-level, and object-level schemes, and highlight emerging methods for reducing data volume and compressing messages under communication constraints. In information fusion, we explore techniques under both ideal and non-ideal conditions, including those addressing heterogeneity, localization errors, latency, and packet loss. Finally, we summarize system-level approaches to support scalability in dense traffic scenarios. Compared with existing surveys, this paper introduces a new perspective by treating V2X communication as an information sensor and emphasizing the challenges of deploying cooperative perception in real-world intelligent transportation systems.

Via

Access Paper or Ask Questions

TraF-Align: Trajectory-aware Feature Alignment for Asynchronous Multi-agent Perception

Mar 25, 2025

Zhiying Song, Lei Yang, Fuxi Wen, Jun Li

Abstract:Cooperative perception presents significant potential for enhancing the sensing capabilities of individual vehicles, however, inter-agent latency remains a critical challenge. Latencies cause misalignments in both spatial and semantic features, complicating the fusion of real-time observations from the ego vehicle with delayed data from others. To address these issues, we propose TraF-Align, a novel framework that learns the flow path of features by predicting the feature-level trajectory of objects from past observations up to the ego vehicle's current time. By generating temporally ordered sampling points along these paths, TraF-Align directs attention from the current-time query to relevant historical features along each trajectory, supporting the reconstruction of current-time features and promoting semantic interaction across multiple frames. This approach corrects spatial misalignment and ensures semantic consistency across agents, effectively compensating for motion and achieving coherent feature fusion. Experiments on two real-world datasets, V2V4Real and DAIR-V2X-Seq, show that TraF-Align sets a new benchmark for asynchronous cooperative perception.

* Accepted to CVPR 2025

Via

Access Paper or Ask Questions

V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception

Nov 17, 2024

Lei Yang, Xinyu Zhang, Jun Li, Chen Wang, Zhiying Song, Tong Zhao, Ziying Song, Li Wang, Mo Zhou, Yang Shen(+2 more)

Figure 1 for V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception

Figure 2 for V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception

Figure 3 for V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception

Figure 4 for V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception

Abstract:Modern autonomous vehicle perception systems often struggle with occlusions and limited perception range. Previous studies have demonstrated the effectiveness of cooperative perception in extending the perception range and overcoming occlusions, thereby improving the safety of autonomous driving. In recent years, a series of cooperative perception datasets have emerged. However, these datasets only focus on camera and LiDAR, overlooking 4D Radar, a sensor employed in single-vehicle autonomous driving for robust perception in adverse weather conditions. In this paper, to bridge the gap of missing 4D Radar datasets in cooperative perception, we present V2X-Radar, the first large real-world multi-modal dataset featuring 4D Radar. Our V2X-Radar dataset is collected using a connected vehicle platform and an intelligent roadside unit equipped with 4D Radar, LiDAR, and multi-view cameras. The collected data includes sunny and rainy weather conditions, spanning daytime, dusk, and nighttime, as well as typical challenging scenarios. The dataset comprises 20K LiDAR frames, 40K camera images, and 20K 4D Radar data, with 350K annotated bounding boxes across five categories. To facilitate diverse research domains, we establish V2X-Radar-C for cooperative perception, V2X-Radar-I for roadside perception, and V2X-Radar-V for single-vehicle perception. We further provide comprehensive benchmarks of recent perception algorithms on the above three sub-datasets. The dataset and benchmark codebase will be available at \url{http://openmpd.com/column/V2X-Radar}.

* 11 pages, 5 figures

Via

Access Paper or Ask Questions

A Spatial Calibration Method for Robust Cooperative Perception

Apr 25, 2023

Zhiying Song, Tenghui Xie, Hailiang Zhang, Fuxi Wen, Jun Li

Abstract:Cooperative perception is a promising technique for enhancing the perception capabilities of automated vehicles through vehicle-to-everything (V2X) cooperation, provided that accurate relative pose transforms are available. Nevertheless, obtaining precise positioning information often entails high costs associated with navigation systems. Moreover, signal drift resulting from factors such as occlusion and multipath effects can compromise the stability of the positioning information. Hence, a low-cost and robust method is required to calibrate relative pose information for multi-agent cooperative perception. In this paper, we propose a simple but effective inter-agent object association approach (CBM), which constructs contexts using the detected bounding boxes, followed by local context matching and global consensus maximization. Based on the matched correspondences, optimal relative pose transform is estimated, followed by cooperative perception fusion. Extensive experimental studies are conducted on both the simulated and real-world datasets, high object association precision and decimeter level relative pose calibration accuracy is achieved among the cooperating agents even with larger inter-agent localization errors. Furthermore, the proposed approach outperforms the state-of-the-art methods in terms of object association and relative pose estimation accuracy, as well as the robustness of cooperative perception against the pose errors of the connected agents. The code will be available at https://github.com/zhyingS/CBM.

Via

Access Paper or Ask Questions

An Efficient and Robust Object-Level Cooperative Perception Framework for Connected and Automated Driving

Oct 12, 2022

Zhiying Song, Fuxi Wen, Hailiang Zhang, Jun Li

Figure 1 for An Efficient and Robust Object-Level Cooperative Perception Framework for Connected and Automated Driving

Figure 2 for An Efficient and Robust Object-Level Cooperative Perception Framework for Connected and Automated Driving

Figure 3 for An Efficient and Robust Object-Level Cooperative Perception Framework for Connected and Automated Driving

Figure 4 for An Efficient and Robust Object-Level Cooperative Perception Framework for Connected and Automated Driving

Abstract:Cooperative perception is challenging for connected and automated driving because of the real-time requirements and bandwidth limitation, especially when the vehicle location and pose information are inaccurate. We propose an efficient object-level cooperative perception framework, in which data of the 3D bounding boxes, location, and pose are broadcast and received between the connected vehicles, then fused at the object level. Two Iterative Closest Point (ICP) and Optimal Transport theory-based matching algorithms are developed to maximize the total correlations between the 3D bounding boxes jointly detected by the vehicles. Experiment results show that it only takes 5ms to associate objects from different vehicles for each frame, and robust performance is achieved for different levels of location and heading errors. Meanwhile, the proposed framework outperforms the state-of-the-art benchmark methods when location or pose errors occur.

Via

Access Paper or Ask Questions