Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Guilian Chen

MonoPGC: Monocular 3D Object Detection with Pixel Geometry Contexts

Feb 21, 2023

Zizhang Wu, Yuanzhu Gan, Lei Wang, Guilian Chen, Jian Pu

Abstract:Monocular 3D object detection reveals an economical but challenging task in autonomous driving. Recently center-based monocular methods have developed rapidly with a great trade-off between speed and accuracy, where they usually depend on the object center's depth estimation via 2D features. However, the visual semantic features without sufficient pixel geometry information, may affect the performance of clues for spatial 3D detection tasks. To alleviate this, we propose MonoPGC, a novel end-to-end Monocular 3D object detection framework with rich Pixel Geometry Contexts. We introduce the pixel depth estimation as our auxiliary task and design depth cross-attention pyramid module (DCPM) to inject local and global depth geometry knowledge into visual features. In addition, we present the depth-space-aware transformer (DSAT) to integrate 3D space position and depth-aware features efficiently. Besides, we design a novel depth-gradient positional encoding (DGPE) to bring more distinct pixel geometry contexts into the transformer for better object detection. Extensive experiments demonstrate that our method achieves the state-of-the-art performance on the KITTI dataset.

* Accepted by ICRA 2023

Via

Access Paper or Ask Questions

MVFusion: Multi-View 3D Object Detection with Semantic-aligned Radar and Camera Fusion

Feb 21, 2023

Zizhang Wu, Guilian Chen, Yuanzhu Gan, Lei Wang, Jian Pu

Figure 1 for MVFusion: Multi-View 3D Object Detection with Semantic-aligned Radar and Camera Fusion

Figure 2 for MVFusion: Multi-View 3D Object Detection with Semantic-aligned Radar and Camera Fusion

Figure 3 for MVFusion: Multi-View 3D Object Detection with Semantic-aligned Radar and Camera Fusion

Figure 4 for MVFusion: Multi-View 3D Object Detection with Semantic-aligned Radar and Camera Fusion

Abstract:Multi-view radar-camera fused 3D object detection provides a farther detection range and more helpful features for autonomous driving, especially under adverse weather. The current radar-camera fusion methods deliver kinds of designs to fuse radar information with camera data. However, these fusion approaches usually adopt the straightforward concatenation operation between multi-modal features, which ignores the semantic alignment with radar features and sufficient correlations across modals. In this paper, we present MVFusion, a novel Multi-View radar-camera Fusion method to achieve semantic-aligned radar features and enhance the cross-modal information interaction. To achieve so, we inject the semantic alignment into the radar features via the semantic-aligned radar encoder (SARE) to produce image-guided radar features. Then, we propose the radar-guided fusion transformer (RGFT) to fuse our radar and image features to strengthen the two modals' correlation from the global scope via the cross-attention mechanism. Extensive experiments show that MVFusion achieves state-of-the-art performance (51.7% NDS and 45.3% mAP) on the nuScenes dataset. We shall release our code and trained networks upon publication.

* Accepted by ICRA 2023

Via

Access Paper or Ask Questions