Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Manuel López-Antequera

SparseFormer: Attention-based Depth Completion Network

Jun 09, 2022

Frederik Warburg, Michael Ramamonjisoa, Manuel López-Antequera

Figure 1 for SparseFormer: Attention-based Depth Completion Network

Figure 2 for SparseFormer: Attention-based Depth Completion Network

Figure 3 for SparseFormer: Attention-based Depth Completion Network

Figure 4 for SparseFormer: Attention-based Depth Completion Network

Abstract:Most pipelines for Augmented and Virtual Reality estimate the ego-motion of the camera by creating a map of sparse 3D landmarks. In this paper, we tackle the problem of depth completion, that is, densifying this sparse 3D map using RGB images as guidance. This remains a challenging problem due to the low density, non-uniform and outlier-prone 3D landmarks produced by SfM and SLAM pipelines. We introduce a transformer block, SparseFormer, that fuses 3D landmarks with deep visual features to produce dense depth. The SparseFormer has a global receptive field, making the module especially effective for depth completion with low-density and non-uniform landmarks. To address the issue of depth outliers among the 3D landmarks, we introduce a trainable refinement module that filters outliers through attention between the sparse landmarks.

* Accepted at CV4ARVR 2022

Via

Access Paper or Ask Questions

Disentangling Monocular 3D Object Detection

May 29, 2019

Andrea Simonelli, Samuel Rota Rota Bulò, Lorenzo Porzi, Manuel López-Antequera, Peter Kontschieder

Figure 1 for Disentangling Monocular 3D Object Detection

Figure 2 for Disentangling Monocular 3D Object Detection

Figure 3 for Disentangling Monocular 3D Object Detection

Figure 4 for Disentangling Monocular 3D Object Detection

Abstract:In this paper we propose an approach for monocular 3D object detection from a single RGB image, which leverages a novel disentangling transformation for 2D and 3D detection losses and a novel, self-supervised confidence score for 3D bounding boxes. Our proposed loss disentanglement has the twofold advantage of simplifying the training dynamics in the presence of losses with complex interactions of parameters, and sidestepping the issue of balancing independent regression terms. Our solution overcomes these issues by isolating the contribution made by groups of parameters to a given loss, without changing its nature. We further apply loss disentanglement to another novel, signed Intersection-over-Union criterion-driven loss for improving 2D detection results. Besides our methodological innovations, we critically review the AP metric used in KITTI3D, which emerged as the most important dataset for comparing 3D detection results. We identify and resolve a flaw in the 11-point interpolated AP metric, affecting all previously published detection results and particularly biases the results of monocular 3D detection. We provide extensive experimental evaluations and ablation studies on the KITTI3D and nuScenes datasets, setting new state-of-the-art results on object category car by large margins.

* Project website at https://research.mapillary.com/publication/MonoDIS/

Via

Access Paper or Ask Questions