Abstract:Image fusion aims to generate a high-resolution multi/hyper-spectral image by combining a high-resolution image with limited spectral information and a low-resolution image with abundant spectral data. Current deep learning (DL)-based methods for image fusion primarily rely on CNNs or Transformers to extract features and merge different types of data. While CNNs are efficient, their receptive fields are limited, restricting their capacity to capture global context. Conversely, Transformers excel at learning global information but are hindered by their quadratic complexity. Fortunately, recent advancements in the State Space Model (SSM), particularly Mamba, offer a promising solution to this issue by enabling global awareness with linear complexity. However, there have been few attempts to explore the potential of SSM in information fusion, which is a crucial ability in domains like image fusion. Therefore, we propose FusionMamba, an innovative method for efficient image fusion. Our contributions mainly focus on two aspects. Firstly, recognizing that images from different sources possess distinct properties, we incorporate Mamba blocks into two U-shaped networks, presenting a novel architecture that extracts spatial and spectral features in an efficient, independent, and hierarchical manner. Secondly, to effectively combine spatial and spectral information, we extend the Mamba block to accommodate dual inputs. This expansion leads to the creation of a new module called the FusionMamba block, which outperforms existing fusion techniques such as concatenation and cross-attention. To validate FusionMamba's effectiveness, we conduct a series of experiments on five datasets related to three image fusion tasks. The quantitative and qualitative evaluation results demonstrate that our method achieves state-of-the-art (SOTA) performance, underscoring the superiority of FusionMamba.
Abstract:In image fusion tasks, pictures from different sources possess distinctive properties, therefore treating them equally will lead to inadequate feature extracting. Besides, multi-scaled networks capture information more sufficiently than single-scaled models in pixel-wised problems. In light of these factors, we propose a source-aware spatial-spectral-integrated double U-shaped network called $\rm{(SU)^2}$Net. The network is mainly composed of a spatial U-net and a spectral U-net, which learn spatial details and spectral characteristics discriminately and hierarchically. In contrast with most previous works that simply apply concatenation to integrate spatial and spectral information, a novel structure named the spatial-spectral block (called $\rm{S^2}$Block) is specially designed to merge feature maps from different sources effectively. Experiment results show that our method outperforms the representative state-of-the-art (SOTA) approaches in both quantitative and qualitative evaluations for a variety of image fusion missions, including remote sensing pansharpening and hyperspectral image super-resolution (HISR).