Abstract:Translating NIR to the visible spectrum is challenging due to cross-domain complexities. Current models struggle to balance a broad receptive field with computational efficiency, limiting practical use. Although the Selective Structured State Space Model, especially the improved version, Mamba, excels in generative tasks by capturing long-range dependencies with linear complexity, its default approach of converting 2D images into 1D sequences neglects local context. In this work, we propose a simple but effective backbone, dubbed ColorMamba, which first introduces Mamba into spectral translation tasks. To explore global long-range dependencies and local context for efficient spectral translation, we introduce learnable padding tokens to enhance the distinction of image boundaries and prevent potential confusion within the sequence model. Furthermore, local convolutional enhancement and agent attention are designed to improve the vanilla Mamba. Moreover, we exploit the HSV color to provide multi-scale guidance in the reconstruction process for more accurate spectral translation. Extensive experiments show that our ColorMamba achieves a 1.02 improvement in terms of PSNR compared with the state-of-the-art method. Our code is available at https://github.com/AlexYangxx/ColorMamba.
Abstract:Recently, the field of few-shot detection within remote sensing imagery has witnessed significant advancements. Despite these progresses, the capacity for continuous conceptual learning still poses a significant challenge to existing methodologies. In this paper, we explore the intricate task of incremental few-shot object detection in remote sensing images. We introduce a pioneering fine-tuningbased technique, termed InfRS, designed to facilitate the incremental learning of novel classes using a restricted set of examples, while concurrently preserving the performance on established base classes without the need to revisit previous datasets. Specifically, we pretrain the model using abundant data from base classes and then generate a set of class-wise prototypes that represent the intrinsic characteristics of the data. In the incremental learning stage, we introduce a Hybrid Prototypical Contrastive (HPC) encoding module for learning discriminative representations. Furthermore, we develop a prototypical calibration strategy based on the Wasserstein distance to mitigate the catastrophic forgetting problem. Comprehensive evaluations on the NWPU VHR-10 and DIOR datasets demonstrate that our model can effectively solve the iFSOD problem in remote sensing images. Code will be released.