Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chengjie Huang

MFSeg: Efficient Multi-frame 3D Semantic Segmentation

May 07, 2025

Chengjie Huang, Krzysztof Czarnecki

Abstract:We propose MFSeg, an efficient multi-frame 3D semantic segmentation framework. By aggregating point cloud sequences at the feature level and regularizing the feature extraction and aggregation process, MFSeg reduces computational overhead while maintaining high accuracy. Moreover, by employing a lightweight MLP-based point decoder, our method eliminates the need to upsample redundant points from past frames. Experiments on the nuScenes and Waymo datasets show that MFSeg outperforms existing methods, demonstrating its effectiveness and efficiency.

* ICRA 2025

Via

Access Paper or Ask Questions

VADet: Multi-frame LiDAR 3D Object Detection using Variable Aggregation

Nov 20, 2024

Chengjie Huang, Vahdat Abdelzad, Sean Sedwards, Krzysztof Czarnecki

Figure 1 for VADet: Multi-frame LiDAR 3D Object Detection using Variable Aggregation

Figure 2 for VADet: Multi-frame LiDAR 3D Object Detection using Variable Aggregation

Figure 3 for VADet: Multi-frame LiDAR 3D Object Detection using Variable Aggregation

Figure 4 for VADet: Multi-frame LiDAR 3D Object Detection using Variable Aggregation

Abstract:Input aggregation is a simple technique used by state-of-the-art LiDAR 3D object detectors to improve detection. However, increasing aggregation is known to have diminishing returns and even performance degradation, due to objects responding differently to the number of aggregated frames. To address this limitation, we propose an efficient adaptive method, which we call Variable Aggregation Detection (VADet). Instead of aggregating the entire scene using a fixed number of frames, VADet performs aggregation per object, with the number of frames determined by an object's observed properties, such as speed and point density. VADet thus reduces the inherent trade-offs of fixed aggregation and is not architecture specific. To demonstrate its benefits, we apply VADet to three popular single-stage detectors and achieve state-of-the-art performance on the Waymo dataset.

* Accepted by WACV 2025

Via

Access Paper or Ask Questions

SSL-Interactions: Pretext Tasks for Interactive Trajectory Prediction

Jan 15, 2024

Prarthana Bhattacharyya, Chengjie Huang, Krzysztof Czarnecki

Figure 1 for SSL-Interactions: Pretext Tasks for Interactive Trajectory Prediction

Figure 2 for SSL-Interactions: Pretext Tasks for Interactive Trajectory Prediction

Figure 3 for SSL-Interactions: Pretext Tasks for Interactive Trajectory Prediction

Figure 4 for SSL-Interactions: Pretext Tasks for Interactive Trajectory Prediction

Abstract:This paper addresses motion forecasting in multi-agent environments, pivotal for ensuring safety of autonomous vehicles. Traditional as well as recent data-driven marginal trajectory prediction methods struggle to properly learn non-linear agent-to-agent interactions. We present SSL-Interactions that proposes pretext tasks to enhance interaction modeling for trajectory prediction. We introduce four interaction-aware pretext tasks to encapsulate various aspects of agent interactions: range gap prediction, closest distance prediction, direction of movement prediction, and type of interaction prediction. We further propose an approach to curate interaction-heavy scenarios from datasets. This curated data has two advantages: it provides a stronger learning signal to the interaction model, and facilitates generation of pseudo-labels for interaction-centric pretext tasks. We also propose three new metrics specifically designed to evaluate predictions in interactive scenes. Our empirical evaluations indicate SSL-Interactions outperforms state-of-the-art motion forecasting methods quantitatively with up to 8% improvement, and qualitatively, for interaction-heavy scenarios.

* 13 pages, 5 figures, submitted to IV-2024

Via

Access Paper or Ask Questions

SOAP: Cross-sensor Domain Adaptation for 3D Object Detection Using Stationary Object Aggregation Pseudo-labelling

Jan 08, 2024

Chengjie Huang, Vahdat Abdelzad, Sean Sedwards, Krzysztof Czarnecki

Figure 1 for SOAP: Cross-sensor Domain Adaptation for 3D Object Detection Using Stationary Object Aggregation Pseudo-labelling

Figure 2 for SOAP: Cross-sensor Domain Adaptation for 3D Object Detection Using Stationary Object Aggregation Pseudo-labelling

Figure 3 for SOAP: Cross-sensor Domain Adaptation for 3D Object Detection Using Stationary Object Aggregation Pseudo-labelling

Figure 4 for SOAP: Cross-sensor Domain Adaptation for 3D Object Detection Using Stationary Object Aggregation Pseudo-labelling

Abstract:We consider the problem of cross-sensor domain adaptation in the context of LiDAR-based 3D object detection and propose Stationary Object Aggregation Pseudo-labelling (SOAP) to generate high quality pseudo-labels for stationary objects. In contrast to the current state-of-the-art in-domain practice of aggregating just a few input scans, SOAP aggregates entire sequences of point clouds at the input level to reduce the sensor domain gap. Then, by means of what we call quasi-stationary training and spatial consistency post-processing, the SOAP model generates accurate pseudo-labels for stationary objects, closing a minimum of 30.3% domain gap compared to few-frame detectors. Our results also show that state-of-the-art domain adaptation approaches can achieve even greater performance in combination with SOAP, in both the unsupervised and semi-supervised settings.

* Accepted by WACV 2024

Via

Access Paper or Ask Questions

Towards Object Re-Identification from Point Clouds for 3D MOT

May 17, 2023

Benjamin Thérien, Chengjie Huang, Adrian Chow, Krzysztof Czarnecki

Abstract:In this work, we study the problem of object re-identification (ReID) in a 3D multi-object tracking (MOT) context, by learning to match pairs of objects from cropped (e.g., using their predicted 3D bounding boxes) point cloud observations. We are not concerned with SOTA performance for 3D MOT, however. Instead, we seek to answer the following question: In a realistic tracking by-detection context, how does object ReID from point clouds perform relative to ReID from images? To enable such a study, we propose a lightweight matching head that can be concatenated to any set or sequence processing backbone (e.g., PointNet or ViT), creating a family of comparable object ReID networks for both modalities. Run in siamese style, our proposed point-cloud ReID networks can make thousands of pairwise comparisons in real-time (10 hz). Our findings demonstrate that their performance increases with higher sensor resolution and approaches that of image ReID when observations are sufficiently dense. Additionally, we investigate our network's ability to enhance 3D multi-object tracking (MOT), showing that our point-cloud ReID networks can successfully re-identify objects which led a strong motion-based tracker into error. To our knowledge, we are the first to study real-time object re-identification from point clouds in a 3D multi-object tracking context.

Via

Access Paper or Ask Questions

Out-of-Distribution Detection for LiDAR-based 3D Object Detection

Sep 28, 2022

Chengjie Huang, Van Duong Nguyen, Vahdat Abdelzad, Christopher Gus Mannes, Luke Rowe, Benjamin Therien, Rick Salay, Krzysztof Czarnecki

Figure 1 for Out-of-Distribution Detection for LiDAR-based 3D Object Detection

Figure 2 for Out-of-Distribution Detection for LiDAR-based 3D Object Detection

Figure 3 for Out-of-Distribution Detection for LiDAR-based 3D Object Detection

Figure 4 for Out-of-Distribution Detection for LiDAR-based 3D Object Detection

Abstract:3D object detection is an essential part of automated driving, and deep neural networks (DNNs) have achieved state-of-the-art performance for this task. However, deep models are notorious for assigning high confidence scores to out-of-distribution (OOD) inputs, that is, inputs that are not drawn from the training distribution. Detecting OOD inputs is challenging and essential for the safe deployment of models. OOD detection has been studied extensively for the classification task, but it has not received enough attention for the object detection task, specifically LiDAR-based 3D object detection. In this paper, we focus on the detection of OOD inputs for LiDAR-based 3D object detection. We formulate what OOD inputs mean for object detection and propose to adapt several OOD detection methods for object detection. We accomplish this by our proposed feature extraction method. To evaluate OOD detection methods, we develop a simple but effective technique of generating OOD objects for a given object detection model. Our evaluation based on the KITTI dataset shows that different OOD detection methods have biases toward detecting specific OOD objects. It emphasizes the importance of combined OOD detection methods and more research in this direction.

* Accepted at ITSC 2022

Via

Access Paper or Ask Questions

SSL-Lanes: Self-Supervised Learning for Motion Forecasting in Autonomous Driving

Jun 28, 2022

Prarthana Bhattacharyya, Chengjie Huang, Krzysztof Czarnecki

Figure 1 for SSL-Lanes: Self-Supervised Learning for Motion Forecasting in Autonomous Driving

Figure 2 for SSL-Lanes: Self-Supervised Learning for Motion Forecasting in Autonomous Driving

Figure 3 for SSL-Lanes: Self-Supervised Learning for Motion Forecasting in Autonomous Driving

Figure 4 for SSL-Lanes: Self-Supervised Learning for Motion Forecasting in Autonomous Driving

Abstract:Self-supervised learning (SSL) is an emerging technique that has been successfully employed to train convolutional neural networks (CNNs) and graph neural networks (GNNs) for more transferable, generalizable, and robust representation learning. However its potential in motion forecasting for autonomous driving has rarely been explored. In this study, we report the first systematic exploration and assessment of incorporating self-supervision into motion forecasting. We first propose to investigate four novel self-supervised learning tasks for motion forecasting with theoretical rationale and quantitative and qualitative comparisons on the challenging large-scale Argoverse dataset. Secondly, we point out that our auxiliary SSL-based learning setup not only outperforms forecasting methods which use transformers, complicated fusion mechanisms and sophisticated online dense goal candidate optimization algorithms in terms of performance accuracy, but also has low inference time and architectural complexity. Lastly, we conduct several experiments to understand why SSL improves motion forecasting. Code is open-sourced at \url{https://github.com/AutoVision-cloud/SSL-Lanes}.

* 16 pages, 7 figures

Via

Access Paper or Ask Questions

LiDAR-MIMO: Efficient Uncertainty Estimation for LiDAR-based 3D Object Detection

Jun 01, 2022

Matthew Pitropov, Chengjie Huang, Vahdat Abdelzad, Krzysztof Czarnecki, Steven Waslander

Figure 1 for LiDAR-MIMO: Efficient Uncertainty Estimation for LiDAR-based 3D Object Detection

Figure 2 for LiDAR-MIMO: Efficient Uncertainty Estimation for LiDAR-based 3D Object Detection

Figure 3 for LiDAR-MIMO: Efficient Uncertainty Estimation for LiDAR-based 3D Object Detection

Figure 4 for LiDAR-MIMO: Efficient Uncertainty Estimation for LiDAR-based 3D Object Detection

Abstract:The estimation of uncertainty in robotic vision, such as 3D object detection, is an essential component in developing safe autonomous systems aware of their own performance. However, the deployment of current uncertainty estimation methods in 3D object detection remains challenging due to timing and computational constraints. To tackle this issue, we propose LiDAR-MIMO, an adaptation of the multi-input multi-output (MIMO) uncertainty estimation method to the LiDAR-based 3D object detection task. Our method modifies the original MIMO by performing multi-input at the feature level to ensure the detection, uncertainty estimation, and runtime performance benefits are retained despite the limited capacity of the underlying detector and the large computational costs of point cloud processing. We compare LiDAR-MIMO with MC dropout and ensembles as baselines and show comparable uncertainty estimation results with only a small number of output heads. Further, LiDAR-MIMO can be configured to be twice as fast as MC dropout and ensembles, while achieving higher mAP than MC dropout and approaching that of ensembles.

* 8 pages, 4 figures and 5 tables. Accepted in IEEE IV 2022

Via

Access Paper or Ask Questions

The missing link: Developing a safety case for perception components in automated driving

Aug 30, 2021

Rick Salay, Krzysztof Czarnecki, Hiroshi Kuwajima, Hirotoshi Yasuoka, Toshihiro Nakae, Vahdat Abdelzad, Chengjie Huang, Maximilian Kahn, Van Duong Nguyen

Figure 1 for The missing link: Developing a safety case for perception components in automated driving

Figure 2 for The missing link: Developing a safety case for perception components in automated driving

Figure 3 for The missing link: Developing a safety case for perception components in automated driving

Figure 4 for The missing link: Developing a safety case for perception components in automated driving

Abstract:Safety assurance is a central concern for the development and societal acceptance of automated driving (AD) systems. Perception is a key aspect of AD that relies heavily on Machine Learning (ML). Despite the known challenges with the safety assurance of ML-based components, proposals have recently emerged for unit-level safety cases addressing these components. Unfortunately, AD safety cases express safety requirements at the system-level and these efforts are missing the critical linking argument connecting safety requirements at the system-level to component performance requirements at the unit-level. In this paper, we propose a generic template for such a linking argument specifically tailored for perception components. The template takes a deductive and formal approach to define strong traceability between levels. We demonstrate the applicability of the template with a detailed case study and discuss its use as a tool to support incremental development of perception components.

Via

Access Paper or Ask Questions

Self-Attention Based Context-Aware 3D Object Detection

Jan 07, 2021

Prarthana Bhattacharyya, Chengjie Huang, Krzysztof Czarnecki

Figure 1 for Self-Attention Based Context-Aware 3D Object Detection

Figure 2 for Self-Attention Based Context-Aware 3D Object Detection

Figure 3 for Self-Attention Based Context-Aware 3D Object Detection

Figure 4 for Self-Attention Based Context-Aware 3D Object Detection

Abstract:Most existing point-cloud based 3D object detectors use convolution-like operators to process information in a local neighbourhood with fixed-weight kernels and aggregate global context hierarchically. However, recent work on non-local neural networks and self-attention for 2D vision has shown that explicitly modeling global context and long-range interactions between positions can lead to more robust and competitive models. In this paper, we explore two variants of self-attention for contextual modeling in 3D object detection by augmenting convolutional features with self-attention features. We first incorporate the pairwise self-attention mechanism into the current state-of-the-art BEV, voxel and point-based detectors and show consistent improvement over strong baseline models while simultaneously significantly reducing their parameter footprint and computational cost. We also propose a self-attention variant that samples a subset of the most representative features by learning deformations over randomly sampled locations. This not only allows us to scale explicit global contextual modeling to larger point-clouds, but also leads to more discriminative and informative feature descriptors. Our method can be flexibly applied to most state-of-the-art detectors with increased accuracy and parameter and compute efficiency. We achieve new state-of-the-art detection performance on KITTI and nuScenes datasets. Code is available at \url{https://github.com/AutoVision-cloud/SA-Det3D}.

* 17 pages, 9 figures

Via

Access Paper or Ask Questions