Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michael Ulrich

Multi-Scale Neighborhood Occupancy Masked Autoencoder for Self-Supervised Learning in LiDAR Point Clouds

Feb 27, 2025

Mohamed Abdelsamad, Michael Ulrich, Claudius Gläser, Abhinav Valada

Abstract:Masked autoencoders (MAE) have shown tremendous potential for self-supervised learning (SSL) in vision and beyond. However, point clouds from LiDARs used in automated driving are particularly challenging for MAEs since large areas of the 3D volume are empty. Consequently, existing work suffers from leaking occupancy information into the decoder and has significant computational complexity, thereby limiting the SSL pre-training to only 2D bird's eye view encoders in practice. In this work, we propose the novel neighborhood occupancy MAE (NOMAE) that overcomes the aforementioned challenges by employing masked occupancy reconstruction only in the neighborhood of non-masked voxels. We incorporate voxel masking and occupancy reconstruction at multiple scales with our proposed hierarchical mask generation technique to capture features of objects of different sizes in the point cloud. NOMAEs are extremely flexible and can be directly employed for SSL in existing 3D architectures. We perform extensive evaluations on the nuScenes and Waymo Open datasets for the downstream perception tasks of semantic segmentation and 3D object detection, comparing with both discriminative and generative SSL methods. The results demonstrate that NOMAE sets the new state-of-the-art on multiple benchmarks for multiple point cloud perception tasks.

Via

Access Paper or Ask Questions

Revisiting Out-of-Distribution Detection in LiDAR-based 3D Object Detection

Apr 24, 2024

Michael Kösel, Marcel Schreiber, Michael Ulrich, Claudius Gläser, Klaus Dietmayer

Abstract:LiDAR-based 3D object detection has become an essential part of automated driving due to its ability to localize and classify objects precisely in 3D. However, object detectors face a critical challenge when dealing with unknown foreground objects, particularly those that were not present in their original training data. These out-of-distribution (OOD) objects can lead to misclassifications, posing a significant risk to the safety and reliability of automated vehicles. Currently, LiDAR-based OOD object detection has not been well studied. We address this problem by generating synthetic training data for OOD objects by perturbing known object categories. Our idea is that these synthetic OOD objects produce different responses in the feature map of an object detector compared to in-distribution (ID) objects. We then extract features using a pre-trained and fixed object detector and train a simple multilayer perceptron (MLP) to classify each detection as either ID or OOD. In addition, we propose a new evaluation protocol that allows the use of existing datasets without modifying the point cloud, ensuring a more authentic evaluation of real-world scenarios. The effectiveness of our method is validated through experiments on the newly proposed nuScenes OOD benchmark. The source code is available at https://github.com/uulm-mrm/mmood3d.

* Accepted for publication at the 2024 35th IEEE Intelligent Vehicles Symposium (IV 2024), June 2-5, 2024, in Jeju Island, Korea

Via

Access Paper or Ask Questions

Exploiting Sparsity in Automotive Radar Object Detection Networks

Aug 15, 2023

Marius Lippke, Maurice Quach, Sascha Braun, Daniel Köhler, Michael Ulrich, Bastian Bischoff, Wei Yap Tan

Abstract:Having precise perception of the environment is crucial for ensuring the secure and reliable functioning of autonomous driving systems. Radar object detection networks are one fundamental part of such systems. CNN-based object detectors showed good performance in this context, but they require large compute resources. This paper investigates sparse convolutional object detection networks, which combine powerful grid-based detection with low compute resources. We investigate radar specific challenges and propose sparse kernel point pillars (SKPP) and dual voxel point convolutions (DVPC) as remedies for the grid rendering and sparse backbone architectures. We evaluate our SKPP-DPVCN architecture on nuScenes, which outperforms the baseline by 5.89% and the previous state of the art by 4.19% in Car AP4.0. Moreover, SKPP-DPVCN reduces the average scale error (ASE) by 21.41% over the baseline.

Via

Access Paper or Ask Questions

Improved Multi-Scale Grid Rendering of Point Clouds for Radar Object Detection Networks

May 25, 2023

Daniel Köhler, Maurice Quach, Michael Ulrich, Frank Meinl, Bastian Bischoff, Holger Blume

Figure 1 for Improved Multi-Scale Grid Rendering of Point Clouds for Radar Object Detection Networks

Figure 2 for Improved Multi-Scale Grid Rendering of Point Clouds for Radar Object Detection Networks

Figure 3 for Improved Multi-Scale Grid Rendering of Point Clouds for Radar Object Detection Networks

Figure 4 for Improved Multi-Scale Grid Rendering of Point Clouds for Radar Object Detection Networks

Abstract:Architectures that first convert point clouds to a grid representation and then apply convolutional neural networks achieve good performance for radar-based object detection. However, the transfer from irregular point cloud data to a dense grid structure is often associated with a loss of information, due to the discretization and aggregation of points. In this paper, we propose a novel architecture, multi-scale KPPillarsBEV, that aims to mitigate the negative effects of grid rendering. Specifically, we propose a novel grid rendering method, KPBEV, which leverages the descriptive power of kernel point convolutions to improve the encoding of local point cloud contexts during grid rendering. In addition, we propose a general multi-scale grid rendering formulation to incorporate multi-scale feature maps into convolutional backbones of detection networks with arbitrary grid rendering methods. We perform extensive experiments on the nuScenes dataset and evaluate the methods in terms of detection performance and computational complexity. The proposed multi-scale KPPillarsBEV architecture outperforms the baseline by 5.37% and the previous state of the art by 2.88% in Car AP4.0 (average precision for a matching threshold of 4 meters) on the nuScenes validation set. Moreover, the proposed single-scale KPBEV grid rendering improves the Car AP4.0 by 2.90% over the baseline while maintaining the same inference speed.

* Accepted for presentation at the 2023 26th International Conference on Information Fusion (FUSION2023), June 27-30, 2023, in Charleston (SC), United States of America

Via

Access Paper or Ask Questions

DeepFusion: A Robust and Modular 3D Object Detector for Lidars, Cameras and Radars

Sep 27, 2022

Florian Drews, Di Feng, Florian Faion, Lars Rosenbaum, Michael Ulrich, Claudius Gläser

Figure 1 for DeepFusion: A Robust and Modular 3D Object Detector for Lidars, Cameras and Radars

Figure 2 for DeepFusion: A Robust and Modular 3D Object Detector for Lidars, Cameras and Radars

Figure 3 for DeepFusion: A Robust and Modular 3D Object Detector for Lidars, Cameras and Radars

Figure 4 for DeepFusion: A Robust and Modular 3D Object Detector for Lidars, Cameras and Radars

Abstract:We propose DeepFusion, a modular multi-modal architecture to fuse lidars, cameras and radars in different combinations for 3D object detection. Specialized feature extractors take advantage of each modality and can be exchanged easily, making the approach simple and flexible. Extracted features are transformed into bird's-eye-view as a common representation for fusion. Spatial and semantic alignment is performed prior to fusing modalities in the feature space. Finally, a detection head exploits rich multi-modal features for improved 3D detection performance. Experimental results for lidar-camera, lidar-camera-radar and camera-radar fusion show the flexibility and effectiveness of our fusion approach. In the process, we study the largely unexplored task of faraway car detection up to 225 meters, showing the benefits of our lidar-camera fusion. Furthermore, we investigate the required density of lidar points for 3D object detection and illustrate implications at the example of robustness against adverse weather conditions. Moreover, ablation studies on our camera-radar fusion highlight the importance of accurate depth estimation.

Via

Access Paper or Ask Questions

Self-Supervised Velocity Estimation for Automotive Radar Object Detection Networks

Jul 07, 2022

Daniel Niederlöhner, Michael Ulrich, Sascha Braun, Daniel Köhler, Florian Faion, Claudius Gläser, André Treptow, Holger Blume

Figure 1 for Self-Supervised Velocity Estimation for Automotive Radar Object Detection Networks

Figure 2 for Self-Supervised Velocity Estimation for Automotive Radar Object Detection Networks

Figure 3 for Self-Supervised Velocity Estimation for Automotive Radar Object Detection Networks

Figure 4 for Self-Supervised Velocity Estimation for Automotive Radar Object Detection Networks

Abstract:This paper presents a method to learn the Cartesian velocity of objects using an object detection network on automotive radar data. The proposed method is self-supervised in terms of generating its own training signal for the velocities. Labels are only required for single-frame, oriented bounding boxes (OBBs). Labels for the Cartesian velocities or contiguous sequences, which are expensive to obtain, are not required. The general idea is to pre-train an object detection network without velocities using single-frame OBB labels, and then exploit the network's OBB predictions on unlabelled data for velocity training. In detail, the network's OBB predictions of the unlabelled frames are updated to the timestamp of a labelled frame using the predicted velocities and the distances between the updated OBBs of the unlabelled frame and the OBB predictions of the labelled frame are used to generate a self-supervised training signal for the velocities. The detection network architecture is extended by a module to account for the temporal relation of multiple scans and a module to represent the radars' radial velocity measurements explicitly. A two-step approach of first training only OBB detection, followed by training OBB detection and velocities is used. Further, a pre-training with pseudo-labels generated from radar radial velocity measurements bootstraps the self-supervised method of this paper. Experiments on the publicly available nuScenes dataset show that the proposed method almost reaches the velocity estimation performance of a fully supervised training, but does not require expensive velocity labels. Furthermore, we outperform a baseline method which uses only radial velocity measurements as labels.

* Accepted for presentation at the 2022 33rd IEEE Intelligent Vehicles Symposium (IV) (IV 2022), June 5-9, 2022, in Aachen, Germany

Via

Access Paper or Ask Questions

Improved Orientation Estimation and Detection with Hybrid Object Detection Networks for Automotive Radar

May 03, 2022

Michael Ulrich, Sascha Braun, Daniel Köhler, Daniel Niederlöhner, Florian Faion, Claudius Gläser, Holger Blume

Figure 1 for Improved Orientation Estimation and Detection with Hybrid Object Detection Networks for Automotive Radar

Figure 2 for Improved Orientation Estimation and Detection with Hybrid Object Detection Networks for Automotive Radar

Figure 3 for Improved Orientation Estimation and Detection with Hybrid Object Detection Networks for Automotive Radar

Figure 4 for Improved Orientation Estimation and Detection with Hybrid Object Detection Networks for Automotive Radar

Abstract:This paper presents novel hybrid architectures that combine grid- and point-based processing to improve the detection performance and orientation estimation of radar-based object detection networks. Purely grid-based detection models operate on a bird's-eye-view (BEV) projection of the input point cloud. These approaches suffer from a loss of detailed information through the discrete grid resolution. This applies in particular to radar object detection, where relatively coarse grid resolutions are commonly used to account for the sparsity of radar point clouds. In contrast, point-based models are not affected by this problem as they continuously process point clouds. However, they generally exhibit worse detection performances than grid-based methods. We show that a point-based model can extract neighborhood features, leveraging the exact relative positions of points, before grid rendering. This has significant benefits for a following convolutional detection backbone. In experiments on the public nuScenes dataset our hybrid architecture achieves improvements in terms of detection performance and orientation estimates over networks from previous literature.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

DeepReflecs: Deep Learning for Automotive Object Classification with Radar Reflections

Oct 19, 2020

Michael Ulrich, Claudius Gläser, Fabian Timm

Figure 1 for DeepReflecs: Deep Learning for Automotive Object Classification with Radar Reflections

Figure 2 for DeepReflecs: Deep Learning for Automotive Object Classification with Radar Reflections

Figure 3 for DeepReflecs: Deep Learning for Automotive Object Classification with Radar Reflections

Figure 4 for DeepReflecs: Deep Learning for Automotive Object Classification with Radar Reflections

Abstract:This paper presents an novel object type classification method for automotive applications which uses deep learning with radar reflections. The method provides object class information such as pedestrian, cyclist, car, or non-obstacle. The method is both powerful and efficient, by using a light-weight deep learning approach on reflection level radar data. It fills the gap between low-performant methods of handcrafted features and high-performant methods with convolutional neural networks. The proposed network exploits the specific characteristics of radar reflection data: It handles unordered lists of arbitrary length as input and it combines both extraction of local and global features. In experiments with real data the proposed network outperforms existing methods of handcrafted or learned features. An ablation study analyzes the impact of the proposed global context layer.

* preprint, under review

Via

Access Paper or Ask Questions