Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

B Ravi Kiran

CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation

Mar 12, 2025

Hariprasath Govindarajan, Maciej K. Wozniak, Marvin Klingner, Camille Maurice, B Ravi Kiran, Senthil Yogamani

Abstract:Vision foundation models (VFMs) such as DINO have led to a paradigm shift in 2D camera-based perception towards extracting generalized features to support many downstream tasks. Recent works introduce self-supervised cross-modal knowledge distillation (KD) as a way to transfer these powerful generalization capabilities into 3D LiDAR-based models. However, they either rely on highly complex distillation losses, pseudo-semantic maps, or limit KD to features useful for semantic segmentation only. In this work, we propose CleverDistiller, a self-supervised, cross-modal 2D-to-3D KD framework introducing a set of simple yet effective design choices: Unlike contrastive approaches relying on complex loss design choices, our method employs a direct feature similarity loss in combination with a multi layer perceptron (MLP) projection head to allow the 3D network to learn complex semantic dependencies throughout the projection. Crucially, our approach does not depend on pseudo-semantic maps, allowing for direct knowledge transfer from a VFM without explicit semantic supervision. Additionally, we introduce the auxiliary self-supervised spatial task of occupancy prediction to enhance the semantic knowledge, obtained from a VFM through KD, with 3D spatial reasoning capabilities. Experiments on standard autonomous driving benchmarks for 2D-to-3D KD demonstrate that CleverDistiller achieves state-of-the-art performance in both semantic segmentation and 3D object detection (3DOD) by up to 10% mIoU, especially when fine tuning on really low data amounts, showing the effectiveness of our simple yet powerful KD strategy

Via

Access Paper or Ask Questions

LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping

May 29, 2024

Nikhil Gosala, Kürsat Petek, B Ravi Kiran, Senthil Yogamani, Paulo Drews-Jr, Wolfram Burgard, Abhinav Valada

Figure 1 for LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping

Figure 2 for LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping

Figure 3 for LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping

Figure 4 for LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping

Abstract:Semantic Bird's Eye View (BEV) maps offer a rich representation with strong occlusion reasoning for various decision making tasks in autonomous driving. However, most BEV mapping approaches employ a fully supervised learning paradigm that relies on large amounts of human-annotated BEV ground truth data. In this work, we address this limitation by proposing the first unsupervised representation learning approach to generate semantic BEV maps from a monocular frontal view (FV) image in a label-efficient manner. Our approach pretrains the network to independently reason about scene geometry and scene semantics using two disjoint neural pathways in an unsupervised manner and then finetunes it for the task of semantic BEV mapping using only a small fraction of labels in the BEV. We achieve label-free pretraining by exploiting spatial and temporal consistency of FV images to learn scene geometry while relying on a novel temporal masked autoencoder formulation to encode the scene representation. Extensive evaluations on the KITTI-360 and nuScenes datasets demonstrate that our approach performs on par with the existing state-of-the-art approaches while using only 1% of BEV labels and no additional labeled data.

* 23 pages, 5 figures

Via

Access Paper or Ask Questions

BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation

Mar 18, 2024

Jonas Schramm, Niclas Vödisch, Kürsat Petek, B Ravi Kiran, Senthil Yogamani, Wolfram Burgard, Abhinav Valada

Figure 1 for BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation

Figure 2 for BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation

Figure 3 for BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation

Figure 4 for BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation

Abstract:Semantic scene segmentation from a bird's-eye-view (BEV) perspective plays a crucial role in facilitating planning and decision-making for mobile robots. Although recent vision-only methods have demonstrated notable advancements in performance, they often struggle under adverse illumination conditions such as rain or nighttime. While active sensors offer a solution to this challenge, the prohibitively high cost of LiDARs remains a limiting factor. Fusing camera data with automotive radars poses a more inexpensive alternative but has received less attention in prior research. In this work, we aim to advance this promising avenue by introducing BEVCar, a novel approach for joint BEV object and map segmentation. The core novelty of our approach lies in first learning a point-based encoding of raw radar data, which is then leveraged to efficiently initialize the lifting of image features into the BEV space. We perform extensive experiments on the nuScenes dataset and demonstrate that BEVCar outperforms the current state of the art. Moreover, we show that incorporating radar information significantly enhances robustness in challenging environmental conditions and improves segmentation performance for distant objects. To foster future research, we provide the weather split of the nuScenes dataset used in our experiments, along with our code and trained models at http://bevcar.cs.uni-freiburg.de.

Via

Access Paper or Ask Questions

Evaluating the effect of data augmentation and BALD heuristics on distillation of Semantic-KITTI dataset

Feb 21, 2023

Anh Duong, Alexandre Almin, Léo Lemarié, B Ravi Kiran

Abstract:Active Learning (AL) has remained relatively unexplored for LiDAR perception tasks in autonomous driving datasets. In this study we evaluate Bayesian active learning methods applied to the task of dataset distillation or core subset selection (subset with near equivalent performance as full dataset). We also study the effect of application of data augmentation (DA) within Bayesian AL based dataset distillation. We perform these experiments on the full Semantic-KITTI dataset. We extend our study over our existing work only on 1/4th of the same dataset. Addition of DA and BALD have a negative impact over the labeling efficiency and thus the capacity to distill datasets. We demonstrate key issues in designing a functional AL framework and finally conclude with a review of challenges in real world active learning.

* Submitted to VISAPP Springer book extension. arXiv admin note: substantial text overlap with arXiv:2202.02661

Via

Access Paper or Ask Questions

Navya3DSeg -- Navya 3D Semantic Segmentation Dataset & split generation for autonomous vehicles

Feb 16, 2023

Alexandre Almin, Léo Lemarié, Anh Duong, B Ravi Kiran

Figure 1 for Navya3DSeg -- Navya 3D Semantic Segmentation Dataset & split generation for autonomous vehicles

Figure 2 for Navya3DSeg -- Navya 3D Semantic Segmentation Dataset & split generation for autonomous vehicles

Figure 3 for Navya3DSeg -- Navya 3D Semantic Segmentation Dataset & split generation for autonomous vehicles

Figure 4 for Navya3DSeg -- Navya 3D Semantic Segmentation Dataset & split generation for autonomous vehicles

Abstract:Autonomous driving (AD) perception today relies heavily on deep learning based architectures requiring large scale annotated datasets with their associated costs for curation and annotation. The 3D semantic data are useful for core perception tasks such as obstacle detection and ego-vehicle localization. We propose a new dataset, Navya 3D Segmentation (Navya3DSeg), with a diverse label space corresponding to a large scale production grade operational domain, including rural, urban, industrial sites and universities from 13 countries. It contains 23 labeled sequences and 25 supplementary sequences without labels, designed to explore self-supervised and semi-supervised semantic segmentation benchmarks on point clouds. We also propose a novel method for sequential dataset split generation based on iterative multi-label stratification, and demonstrated to achieve a +1.2% mIoU improvement over the original split proposed by SemanticKITTI dataset. A complete benchmark for semantic segmentation task was performed, with state of the art methods. Finally, we demonstrate an active learning (AL) based dataset distillation framework. We introduce a novel heuristic-free sampling method called distance sampling in the context of AL. A detailed presentation on the dataset is available at https://www.youtube.com/watch?v=5m6ALIs-s20 .

* Submitted to RA-L. Version with supplementary materials

Via

Access Paper or Ask Questions

Self-Supervised 3D Monocular Object Detection by Recycling Bounding Boxes

Jun 25, 2022

Sugirtha T, Sridevi M, Khailash Santhakumar, Hao Liu, B Ravi Kiran, Thomas Gauthier, Senthil Yogamani

Figure 1 for Self-Supervised 3D Monocular Object Detection by Recycling Bounding Boxes

Figure 2 for Self-Supervised 3D Monocular Object Detection by Recycling Bounding Boxes

Figure 3 for Self-Supervised 3D Monocular Object Detection by Recycling Bounding Boxes

Figure 4 for Self-Supervised 3D Monocular Object Detection by Recycling Bounding Boxes

Abstract:Modern object detection architectures are moving towards employing self-supervised learning (SSL) to improve performance detection with related pretext tasks. Pretext tasks for monocular 3D object detection have not yet been explored yet in literature. The paper studies the application of established self-supervised bounding box recycling by labeling random windows as the pretext task. The classifier head of the 3D detector is trained to classify random windows containing different proportions of the ground truth objects, thus handling the foreground-background imbalance. We evaluate the pretext task using the RTM3D detection model as baseline, with and without the application of data augmentation. We demonstrate improvements of between 2-3 % in mAP 3D and 0.9-1.5 % BEV scores using SSL over the baseline scores. We propose the inverse class frequency re-weighted (ICFW) mAP score that highlights improvements in detection for low frequency classes in a class imbalanced dataset with long tails. We demonstrate improvements in ICFW both mAP 3D and BEV scores to take into account the class imbalance in the KITTI validation dataset. We see 4-5 % increase in ICFW metric with the pretext task.

* Published at ICCVW-SSLAD 2021. arXiv admin note: substantial text overlap with arXiv:2104.10786

Via

Access Paper or Ask Questions

Simulation-to-Reality domain adaptation for offline 3D object annotation on pointclouds with correlation alignment

Feb 06, 2022

Weishuang Zhang, B Ravi Kiran, Thomas Gauthier, Yanis Mazouz, Theo Steger

Figure 1 for Simulation-to-Reality domain adaptation for offline 3D object annotation on pointclouds with correlation alignment

Figure 2 for Simulation-to-Reality domain adaptation for offline 3D object annotation on pointclouds with correlation alignment

Figure 3 for Simulation-to-Reality domain adaptation for offline 3D object annotation on pointclouds with correlation alignment

Figure 4 for Simulation-to-Reality domain adaptation for offline 3D object annotation on pointclouds with correlation alignment

Abstract:Annotating objects with 3D bounding boxes in LiDAR pointclouds is a costly human driven process in an autonomous driving perception system. In this paper, we present a method to semi-automatically annotate real-world pointclouds collected by deployment vehicles using simulated data. We train a 3D object detector model on labeled simulated data from CARLA jointly with real world pointclouds from our target vehicle. The supervised object detection loss is augmented with a CORAL loss term to reduce the distance between labeled simulated and unlabeled real pointcloud feature representations. The goal here is to learn representations that are invariant to simulated (labeled) and real-world (unlabeled) target domains. We also provide an updated survey on domain adaptation methods for pointclouds.

Via

Access Paper or Ask Questions

LiDAR dataset distillation within bayesian active learning framework: Understanding the effect of data augmentation

Feb 06, 2022

Ngoc Phuong Anh Duong, Alexandre Almin, Léo Lemarié, B Ravi Kiran

Figure 1 for LiDAR dataset distillation within bayesian active learning framework: Understanding the effect of data augmentation

Figure 2 for LiDAR dataset distillation within bayesian active learning framework: Understanding the effect of data augmentation

Figure 3 for LiDAR dataset distillation within bayesian active learning framework: Understanding the effect of data augmentation

Figure 4 for LiDAR dataset distillation within bayesian active learning framework: Understanding the effect of data augmentation

Abstract:Autonomous driving (AD) datasets have progressively grown in size in the past few years to enable better deep representation learning. Active learning (AL) has re-gained attention recently to address reduction of annotation costs and dataset size. AL has remained relatively unexplored for AD datasets, especially on point cloud data from LiDARs. This paper performs a principled evaluation of AL based dataset distillation on (1/4th) of the large Semantic-KITTI dataset. Further on, the gains in model performance due to data augmentation (DA) are demonstrated across different subsets of the AL loop. We also demonstrate how DA improves the selection of informative samples to annotate. We observe that data augmentation achieves full dataset accuracy using only 60\% of samples from the selected dataset configuration. This provides faster training time and subsequent gains in annotation costs.

* Accepted at VISAPP 2022

Via

Access Paper or Ask Questions

Exploring 2D Data Augmentation for 3D Monocular Object Detection

Apr 21, 2021

Sugirtha T, Sridevi M, Khailash Santhakumar, B Ravi Kiran, Thomas Gauthier, Senthil Yogamani

Figure 1 for Exploring 2D Data Augmentation for 3D Monocular Object Detection

Figure 2 for Exploring 2D Data Augmentation for 3D Monocular Object Detection

Figure 3 for Exploring 2D Data Augmentation for 3D Monocular Object Detection

Figure 4 for Exploring 2D Data Augmentation for 3D Monocular Object Detection

Abstract:Data augmentation is a key component of CNN based image recognition tasks like object detection. However, it is relatively less explored for 3D object detection. Many standard 2D object detection data augmentation techniques do not extend to 3D box. Extension of these data augmentations for 3D object detection requires adaptation of the 3D geometry of the input scene and synthesis of new viewpoints. This requires accurate depth information of the scene which may not be always available. In this paper, we evaluate existing 2D data augmentations and propose two novel augmentations for monocular 3D detection without a requirement for novel view synthesis. We evaluate these augmentations on the RTM3D detection model firstly due to the shorter training times . We obtain a consistent improvement by 4% in the 3D AP (@IoU=0.7) for cars, ~1.8% scores 3D AP (@IoU=0.25) for pedestrians & cyclists, over the baseline on KITTI car detection dataset. We also demonstrate a rigorous evaluation of the mAP scores by re-weighting them to take into account the class imbalance in the KITTI validation dataset.

Via

Access Paper or Ask Questions

Road Segmentation on low resolution Lidar point clouds for autonomous vehicles

May 27, 2020

Leonardo Gigli, B Ravi Kiran, Thomas Paul, Andres Serna, Nagarjuna Vemuri, Beatriz Marcotegui, Santiago Velasco-Forero

Figure 1 for Road Segmentation on low resolution Lidar point clouds for autonomous vehicles

Figure 2 for Road Segmentation on low resolution Lidar point clouds for autonomous vehicles

Figure 3 for Road Segmentation on low resolution Lidar point clouds for autonomous vehicles

Figure 4 for Road Segmentation on low resolution Lidar point clouds for autonomous vehicles

Abstract:Point cloud datasets for perception tasks in the context of autonomous driving often rely on high resolution 64-layer Light Detection and Ranging (LIDAR) scanners. They are expensive to deploy on real-world autonomous driving sensor architectures which usually employ 16/32 layer LIDARs. We evaluate the effect of subsampling image based representations of dense point clouds on the accuracy of the road segmentation task. In our experiments the low resolution 16/32 layer LIDAR point clouds are simulated by subsampling the original 64 layer data, for subsequent transformation in to a feature map in the Bird-Eye-View (BEV) and SphericalView (SV) representations of the point cloud. We introduce the usage of the local normal vector with the LIDAR's spherical coordinates as an input channel to existing LoDNN architectures. We demonstrate that this local normal feature in conjunction with classical features not only improves performance for binary road segmentation on full resolution point clouds, but it also reduces the negative impact on the accuracy when subsampling dense point clouds as compared to the usage of classical features alone. We assess our method with several experiments on two datasets: KITTI Road-segmentation benchmark and the recently released Semantic KITTI dataset.

* ISPRS 2020

Via

Access Paper or Ask Questions