Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jason Rambach

Resolving Symmetry Ambiguity in Correspondence-based Methods for Instance-level Object Pose Estimation

May 17, 2024

Yongliang Lin, Yongzhi Su, Sandeep Inuganti, Yan Di, Naeem Ajilforoushan, Hanqing Yang, Yu Zhang, Jason Rambach

Abstract:Estimating the 6D pose of an object from a single RGB image is a critical task that becomes additionally challenging when dealing with symmetric objects. Recent approaches typically establish one-to-one correspondences between image pixels and 3D object surface vertices. However, the utilization of one-to-one correspondences introduces ambiguity for symmetric objects. To address this, we propose SymCode, a symmetry-aware surface encoding that encodes the object surface vertices based on one-to-many correspondences, eliminating the problem of one-to-one correspondence ambiguity. We also introduce SymNet, a fast end-to-end network that directly regresses the 6D pose parameters without solving a PnP problem. We demonstrate faster runtime and comparable accuracy achieved by our method on the T-LESS and IC-BIN benchmarks of mostly symmetric objects. Our source code will be released upon acceptance.

* 8 pages,10 figures

Via

Access Paper or Ask Questions

HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation

Nov 21, 2023

Yongliang Lin, Yongzhi Su, Praveen Nathan, Sandeep Inuganti, Yan Di, Martin Sundermeyer, Fabian Manhardt, Didier Stricke, Jason Rambach, Yu Zhang

Figure 1 for HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation

Figure 2 for HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation

Figure 3 for HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation

Figure 4 for HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation

Abstract:In this work, we present a novel dense-correspondence method for 6DoF object pose estimation from a single RGB-D image. While many existing data-driven methods achieve impressive performance, they tend to be time-consuming due to their reliance on rendering-based refinement approaches. To circumvent this limitation, we present HiPose, which establishes 3D-3D correspondences in a coarse-to-fine manner with a hierarchical binary surface encoding. Unlike previous dense-correspondence methods, we estimate the correspondence surface by employing point-to-surface matching and iteratively constricting the surface until it becomes a correspondence point while gradually removing outliers. Extensive experiments on public benchmarks LM-O, YCB-V, and T-Less demonstrate that our method surpasses all refinement-free methods and is even on par with expensive refinement-based approaches. Crucially, our approach is computationally efficient and enables real-time critical applications with high accuracy requirements. Code and models will be released.

Via

Access Paper or Ask Questions

Cross-Dataset Experimental Study of Radar-Camera Fusion in Bird's-Eye View

Sep 27, 2023

Lukas Stäcker, Philipp Heidenreich, Jason Rambach, Didier Stricker

Figure 1 for Cross-Dataset Experimental Study of Radar-Camera Fusion in Bird's-Eye View

Figure 2 for Cross-Dataset Experimental Study of Radar-Camera Fusion in Bird's-Eye View

Figure 3 for Cross-Dataset Experimental Study of Radar-Camera Fusion in Bird's-Eye View

Figure 4 for Cross-Dataset Experimental Study of Radar-Camera Fusion in Bird's-Eye View

Abstract:By exploiting complementary sensor information, radar and camera fusion systems have the potential to provide a highly robust and reliable perception system for advanced driver assistance systems and automated driving functions. Recent advances in camera-based object detection offer new radar-camera fusion possibilities with bird's eye view feature maps. In this work, we propose a novel and flexible fusion network and evaluate its performance on two datasets: nuScenes and View-of-Delft. Our experiments reveal that while the camera branch needs large and diverse training data, the radar branch benefits more from a high-performance radar. Using transfer learning, we improve the camera's performance on the smaller dataset. Our results further demonstrate that the radar-camera fusion approach significantly outperforms the camera-only and radar-only baselines.

* EUSIPCO 2023

Via

Access Paper or Ask Questions

Single Frame Semantic Segmentation Using Multi-Modal Spherical Images

Aug 18, 2023

Suresh Guttikonda, Jason Rambach

Abstract:In recent years, the research community has shown a lot of interest to panoramic images that offer a 360-degree directional perspective. Multiple data modalities can be fed, and complimentary characteristics can be utilized for more robust and rich scene interpretation based on semantic segmentation, to fully realize the potential. Existing research, however, mostly concentrated on pinhole RGB-X semantic segmentation. In this study, we propose a transformer-based cross-modal fusion architecture to bridge the gap between multi-modal fusion and omnidirectional scene perception. We employ distortion-aware modules to address extreme object deformations and panorama distortions that result from equirectangular representation. Additionally, we conduct cross-modal interactions for feature rectification and information exchange before merging the features in order to communicate long-range contexts for bi-modal and tri-modal feature streams. In thorough tests using combinations of four different modality types in three indoor panoramic-view datasets, our technique achieved state-of-the-art mIoU performance: 60.60% on Stanford2D3DS (RGB-HHA), 71.97% Structured3D (RGB-D-N), and 35.92% Matterport3D (RGB-D). We plan to release all codes and trained models soon.

* Accepted at WACV 2024

Via

Access Paper or Ask Questions

U-RED: Unsupervised 3D Shape Retrieval and Deformation for Partial Point Clouds

Aug 11, 2023

Yan Di, Chenyangguang Zhang, Ruida Zhang, Fabian Manhardt, Yongzhi Su, Jason Rambach, Didier Stricker, Xiangyang Ji, Federico Tombari

Abstract:In this paper, we propose U-RED, an Unsupervised shape REtrieval and Deformation pipeline that takes an arbitrary object observation as input, typically captured by RGB images or scans, and jointly retrieves and deforms the geometrically similar CAD models from a pre-established database to tightly match the target. Considering existing methods typically fail to handle noisy partial observations, U-RED is designed to address this issue from two aspects. First, since one partial shape may correspond to multiple potential full shapes, the retrieval method must allow such an ambiguous one-to-many relationship. Thereby U-RED learns to project all possible full shapes of a partial target onto the surface of a unit sphere. Then during inference, each sampling on the sphere will yield a feasible retrieval. Second, since real-world partial observations usually contain noticeable noise, a reliable learned metric that measures the similarity between shapes is necessary for stable retrieval. In U-RED, we design a novel point-wise residual-guided metric that allows noise-robust comparison. Extensive experiments on the synthetic datasets PartNet, ComplementMe and the real-world dataset Scan2CAD demonstrate that U-RED surpasses existing state-of-the-art approaches by 47.3%, 16.7% and 31.6% respectively under Chamfer Distance.

* ICCV2023

Via

Access Paper or Ask Questions

Achieving RGB-D level Segmentation Performance from a Single ToF Camera

Jun 30, 2023

Pranav Sharma, Jigyasa Singh Katrolia, Jason Rambach, Bruno Mirbach, Didier Stricker, Juergen Seiler

Abstract:Depth is a very important modality in computer vision, typically used as complementary information to RGB, provided by RGB-D cameras. In this work, we show that it is possible to obtain the same level of accuracy as RGB-D cameras on a semantic segmentation task using infrared (IR) and depth images from a single Time-of-Flight (ToF) camera. In order to fuse the IR and depth modalities of the ToF camera, we introduce a method utilizing depth-specific convolutions in a multi-task learning framework. In our evaluation on an in-car segmentation dataset, we demonstrate the competitiveness of our method against the more costly RGB-D approaches.

Via

Access Paper or Ask Questions

RC-BEVFusion: A Plug-In Module for Radar-Camera Bird's Eye View Feature Fusion

May 25, 2023

Lukas Stäcker, Shashank Mishra, Philipp Heidenreich, Jason Rambach, Didier Stricker

Figure 1 for RC-BEVFusion: A Plug-In Module for Radar-Camera Bird's Eye View Feature Fusion

Figure 2 for RC-BEVFusion: A Plug-In Module for Radar-Camera Bird's Eye View Feature Fusion

Figure 3 for RC-BEVFusion: A Plug-In Module for Radar-Camera Bird's Eye View Feature Fusion

Figure 4 for RC-BEVFusion: A Plug-In Module for Radar-Camera Bird's Eye View Feature Fusion

Abstract:Radars and cameras belong to the most frequently used sensors for advanced driver assistance systems and automated driving research. However, there has been surprisingly little research on radar-camera fusion with neural networks. One of the reasons is a lack of large-scale automotive datasets with radar and unmasked camera data, with the exception of the nuScenes dataset. Another reason is the difficulty of effectively fusing the sparse radar point cloud on the bird's eye view (BEV) plane with the dense images on the perspective plane. The recent trend of camera-based 3D object detection using BEV features has enabled a new type of fusion, which is better suited for radars. In this work, we present RC-BEVFusion, a modular radar-camera fusion network on the BEV plane. We propose BEVFeatureNet, a novel radar encoder branch, and show that it can be incorporated into several state-of-the-art camera-based architectures. We show significant performance gains of up to 28% increase in the nuScenes detection score, which is an important step in radar-camera fusion research. Without tuning our model for the nuScenes benchmark, we achieve the best result among all published methods in the radar-camera fusion category.

Via

Access Paper or Ask Questions

OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object Detection

Nov 02, 2022

Yongzhi Su, Yan Di, Fabian Manhardt, Guangyao Zhai, Jason Rambach, Benjamin Busam, Didier Stricker, Federico Tombari

Abstract:Despite monocular 3D object detection having recently made a significant leap forward thanks to the use of pre-trained depth estimators for pseudo-LiDAR recovery, such two-stage methods typically suffer from overfitting and are incapable of explicitly encapsulating the geometric relation between depth and object bounding box. To overcome this limitation, we instead propose OPA-3D, a single-stage, end-to-end, Occlusion-Aware Pixel-Wise Aggregation network that to jointly estimate dense scene depth with depth-bounding box residuals and object bounding boxes, allowing a two-stream detection of 3D objects, leading to significantly more robust detections. Thereby, the geometry stream denoted as the Geometry Stream, combines visible depth and depth-bounding box residuals to recover the object bounding box via explicit occlusion-aware optimization. In addition, a bounding box based geometry projection scheme is employed in an effort to enhance distance perception. The second stream, named as the Context Stream, directly regresses 3D object location and size. This novel two-stream representation further enables us to enforce cross-stream consistency terms which aligns the outputs of both streams, improving the overall performance. Extensive experiments on the public benchmark demonstrate that OPA-3D outperforms state-of-the-art methods on the main Car category, whilst keeping a real-time inference speed. We plan to release all codes and trained models soon.

Via

Access Paper or Ask Questions

Unsupervised Anomaly Detection from Time-of-Flight Depth Images

Apr 12, 2022

Pascal Schneider, Jason Rambach, Bruno Mirbach, Didier Stricker

Figure 1 for Unsupervised Anomaly Detection from Time-of-Flight Depth Images

Figure 2 for Unsupervised Anomaly Detection from Time-of-Flight Depth Images

Figure 3 for Unsupervised Anomaly Detection from Time-of-Flight Depth Images

Figure 4 for Unsupervised Anomaly Detection from Time-of-Flight Depth Images

Abstract:Video anomaly detection (VAD) addresses the problem of automatically finding anomalous events in video data. The primary data modalities on which current VAD systems work on are monochrome or RGB images. Using depth data in this context instead is still hardly explored in spite of depth images being a popular choice in many other computer vision research areas and the increasing availability of inexpensive depth camera hardware. We evaluate the application of existing autoencoder-based methods on depth video and propose how the advantages of using depth data can be leveraged by integration into the loss function. Training is done unsupervised using normal sequences without need for any additional annotations. We show that depth allows easy extraction of auxiliary information for scene analysis in the form of a foreground mask and demonstrate its beneficial effect on the anomaly detection performance through evaluation on a large public dataset, for which we are also the first ones to present results on.

Via

Access Paper or Ask Questions

ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose Estimation

Mar 29, 2022

Yongzhi Su, Mahdi Saleh, Torben Fetzer, Jason Rambach, Nassir Navab, Benjamin Busam, Didier Stricker, Federico Tombari

Figure 1 for ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose Estimation

Figure 2 for ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose Estimation

Figure 3 for ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose Estimation

Figure 4 for ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose Estimation

Abstract:Establishing correspondences from image to 3D has been a key task of 6DoF object pose estimation for a long time. To predict pose more accurately, deeply learned dense maps replaced sparse templates. Dense methods also improved pose estimation in the presence of occlusion. More recently researchers have shown improvements by learning object fragments as segmentation. In this work, we present a discrete descriptor, which can represent the object surface densely. By incorporating a hierarchical binary grouping, we can encode the object surface very efficiently. Moreover, we propose a coarse to fine training strategy, which enables fine-grained correspondence prediction. Finally, by matching predicted codes with object surface and using a PnP solver, we estimate the 6DoF pose. Results on the public LM-O and YCB-V datasets show major improvement over the state of the art w.r.t. ADD(-S) metric, even surpassing RGB-D based methods in some cases.

* CVPR2022 camera ready

Via

Access Paper or Ask Questions