Abstract:Self-supervised learning (SSL) on 3D point clouds has the potential to learn feature representations that can transfer to diverse sensors and multiple downstream perception tasks. However, recent SSL approaches fail to define pretext tasks that retain geometric information such as object pose and scale, which can be detrimental to the performance of downstream localization and geometry-sensitive 3D scene understanding tasks, such as 3D semantic segmentation and 3D object detection. We propose PSA-SSL, a novel extension to point cloud SSL that learns object pose and size-aware (PSA) features. Our approach defines a self-supervised bounding box regression pretext task, which retains object pose and size information. Furthermore, we incorporate LiDAR beam pattern augmentation on input point clouds, which encourages learning sensor-agnostic features. Our experiments demonstrate that with a single pretrained model, our light-weight yet effective extensions achieve significant improvements on 3D semantic segmentation with limited labels across popular autonomous driving datasets (Waymo, nuScenes, SemanticKITTI). Moreover, our approach outperforms other state-of-the-art SSL methods on 3D semantic segmentation (using up to 10 times less labels), as well as on 3D object detection. Our code will be released on https://github.com/TRAILab/PSA-SSL.
Abstract:Accurate 3D object detection in all weather conditions remains a key challenge to enable the widespread deployment of autonomous vehicles, as most work to date has been performed on clear weather data. In order to generalize to adverse weather conditions, supervised methods perform best if trained from scratch on all weather data instead of finetuning a model pretrained on clear weather data. Training from scratch on all data will eventually become computationally infeasible and expensive as datasets continue to grow and encompass the full extent of possible weather conditions. On the other hand, naive finetuning on data from a different weather domain can result in catastrophic forgetting of the previously learned domain. Inspired by the success of replay-based continual learning methods, we propose Gradient-based Maximally Interfered Retrieval (GMIR), a gradient based sampling strategy for replay. During finetuning, GMIR periodically retrieves samples from the previous domain dataset whose gradient vectors show maximal interference with the gradient vector of the current update. Our 3D object detection experiments on the SeeingThroughFog (STF) dataset show that GMIR not only overcomes forgetting but also offers competitive performance compared to scratch training on all data with a 46.25% reduction in total training time.