Abstract:Wide-FoV cameras, like fisheye and panoramic setups, are essential for broader perception but introduce significant distortions in 180{\deg} and 360{\deg} images, complicating dense prediction tasks. For instance, existing MAMBA models lacking distortion-aware capacity cannot perform well in panoramic semantic segmentation. To address this problem, this work presents Deformable Mamba, a unified framework specifically designed to address imaging distortions within the context of panoramic and fisheye semantic segmentation. At the core is a decoder constructed with a series of Deformable Mamba Fusion (DMF) blocks, making the whole framework more deformable, efficient, and accurate, when handling extreme distortions. Extensive evaluations across five datasets demonstrate that our method consistently improves segmentation accuracy compared to the previous state-of-the-art methods tailored for specific FoVs. Notably, Deformable Mamba achieves a +2.5% performance improvement on the 360{\deg} Stanford2D3D dataset, and shows better results across FoVs from 60{\deg} to 360{\deg}.
Abstract:In the field of autonomous driving, Bird's-Eye-View (BEV) perception has attracted increasing attention in the community since it provides more comprehensive information compared with pinhole front-view images and panoramas. Traditional BEV methods, which rely on multiple narrow-field cameras and complex pose estimations, often face calibration and synchronization issues. To break the wall of the aforementioned challenges, in this work, we introduce OneBEV, a novel BEV semantic mapping approach using merely a single panoramic image as input, simplifying the mapping process and reducing computational complexities. A distortion-aware module termed Mamba View Transformation (MVT) is specifically designed to handle the spatial distortions in panoramas, transforming front-view features into BEV features without leveraging traditional attention mechanisms. Apart from the efficient framework, we contribute two datasets, i.e., nuScenes-360 and DeepAccident-360, tailored for the OneBEV task. Experimental results showcase that OneBEV achieves state-of-the-art performance with 51.1% and 36.1% mIoU on nuScenes-360 and DeepAccident-360, respectively. This work advances BEV semantic mapping in autonomous driving, paving the way for more advanced and reliable autonomous systems.
Abstract:Data scarcity and data imbalance have attracted a lot of attention in many fields. Data augmentation, explored as an effective approach to tackle them, can improve the robustness and efficiency of classification models by generating new samples. This paper presents REPRINT, a simple and effective hidden-space data augmentation method for imbalanced data classification. Given hidden-space representations of samples in each class, REPRINT extrapolates, in a randomized fashion, augmented examples for target class by using subspaces spanned by principal components to summarize distribution structure of both source and target class. Consequently, the examples generated would diversify the target while maintaining the original geometry of target distribution. Besides, this method involves a label refinement component which allows to synthesize new soft labels for augmented examples. Compared with different NLP data augmentation approaches under a range of data imbalanced scenarios on four text classification benchmark, REPRINT shows prominent improvements. Moreover, through comprehensive ablation studies, we show that label refinement is better than label-preserving for augmented examples, and that our method suggests stable and consistent improvements in terms of suitable choices of principal components. Moreover, REPRINT is appealing for its easy-to-use since it contains only one hyperparameter determining the dimension of subspace and requires low computational resource.