Abstract:Scene understanding plays an essential role in enabling autonomous driving and maintaining high standards of performance and safety. To address this task, cameras and laser scanners (LiDARs) have been the most commonly used sensors, with radars being less popular. Despite that, radars remain low-cost, information-dense, and fast-sensing techniques that are resistant to adverse weather conditions. While multiple works have been previously presented for radar-based scene semantic segmentation, the nature of the radar data still poses a challenge due to the inherent noise and sparsity, as well as the disproportionate foreground and background. In this work, we propose a novel approach to the semantic segmentation of radar scenes using a multi-input fusion of radar data through a novel architecture and loss functions that are tailored to tackle the drawbacks of radar perception. Our novel architecture includes an efficient attention block that adaptively captures important feature information. Our method, TransRadar, outperforms state-of-the-art methods on the CARRADA and RADIal datasets while having smaller model sizes. https://github.com/YahiDar/TransRadar
Abstract:The performance of perception systems developed for autonomous driving vehicles has seen significant improvements over the last few years. This improvement was associated with the increasing use of LiDAR sensors and point cloud data to facilitate the task of object detection and recognition in autonomous driving. However, LiDAR and camera systems show deteriorating performances when used in unfavorable conditions like dusty and rainy weather. Radars on the other hand operate on relatively longer wavelengths which allows for much more robust measurements in these conditions. Despite that, radar-centric data sets do not get a lot of attention in the development of deep learning techniques for radar perception. In this work, we consider the radar object detection problem, in which the radar frequency data is the only input into the detection framework. We further investigate the challenges of using radar-only data in deep learning models. We propose a transformers-based model, named RadarFormer, that utilizes state-of-the-art developments in vision deep learning. Our model also introduces a channel-chirp-time merging module that reduces the size and complexity of our models by more than 10 times without compromising accuracy. Comprehensive experiments on the CRUW radar dataset demonstrate the advantages of the proposed method. Our RadarFormer performs favorably against the state-of-the-art methods while being 2x faster during inference and requiring only one-tenth of their model parameters. The code associated with this paper is available at https://github.com/YahiDar/RadarFormer.