Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shaohong Wang

RayFusion: Ray Fusion Enhanced Collaborative Visual Perception

Oct 09, 2025

Shaohong Wang, Bin Lu, Xinyu Xiao, Hanzhi Zhong, Bowen Pang, Tong Wang, Zhiyu Xiang, Hangguan Shan, Eryun Liu

Figure 1 for RayFusion: Ray Fusion Enhanced Collaborative Visual Perception

Figure 2 for RayFusion: Ray Fusion Enhanced Collaborative Visual Perception

Figure 3 for RayFusion: Ray Fusion Enhanced Collaborative Visual Perception

Figure 4 for RayFusion: Ray Fusion Enhanced Collaborative Visual Perception

Abstract:Collaborative visual perception methods have gained widespread attention in the autonomous driving community in recent years due to their ability to address sensor limitation problems. However, the absence of explicit depth information often makes it difficult for camera-based perception systems, e.g., 3D object detection, to generate accurate predictions. To alleviate the ambiguity in depth estimation, we propose RayFusion, a ray-based fusion method for collaborative visual perception. Using ray occupancy information from collaborators, RayFusion reduces redundancy and false positive predictions along camera rays, enhancing the detection performance of purely camera-based collaborative perception systems. Comprehensive experiments show that our method consistently outperforms existing state-of-the-art models, substantially advancing the performance of collaborative visual perception. The code is available at https://github.com/wangsh0111/RayFusion.

* Accepted by NeurIPS2025

Via

Access Paper or Ask Questions

IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception

Jul 13, 2024

Shaohong Wang, Lu Bin, Xinyu Xiao, Zhiyu Xiang, Hangguan Shan, Eryun Liu

Figure 1 for IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception

Figure 2 for IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception

Figure 3 for IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception

Figure 4 for IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception

Abstract:Multi-agent collaborative perception has emerged as a widely recognized technology in the field of autonomous driving in recent years. However, current collaborative perception predominantly relies on LiDAR point clouds, with significantly less attention given to methods using camera images. This severely impedes the development of budget-constrained collaborative systems and the exploitation of the advantages offered by the camera modality. This work proposes an instance-level fusion transformer for visual collaborative perception (IFTR), which enhances the detection performance of camera-only collaborative perception systems through the communication and sharing of visual features. To capture the visual information from multiple agents, we design an instance feature aggregation that interacts with the visual features of individual agents using predefined grid-shaped bird eye view (BEV) queries, generating more comprehensive and accurate BEV features. Additionally, we devise a cross-domain query adaptation as a heuristic to fuse 2D priors, implicitly encoding the candidate positions of targets. Furthermore, IFTR optimizes communication efficiency by sending instance-level features, achieving an optimal performance-bandwidth trade-off. We evaluate the proposed IFTR on a real dataset, DAIR-V2X, and two simulated datasets, OPV2V and V2XSet, achieving performance improvements of 57.96%, 9.23% and 12.99% in AP@70 metrics compared to the previous SOTAs, respectively. Extensive experiments demonstrate the superiority of IFTR and the effectiveness of its key components. The code is available at https://github.com/wangsh0111/IFTR.

Via

Access Paper or Ask Questions