Abstract:Recent progress in remote sensing image (RSI) super-resolution (SR) has exhibited remarkable performance using deep neural networks, e.g., Convolutional Neural Networks and Transformers. However, existing SR methods often suffer from either a limited receptive field or quadratic computational overhead, resulting in sub-optimal global representation and unacceptable computational costs in large-scale RSI. To alleviate these issues, we develop the first attempt to integrate the Vision State Space Model (Mamba) for RSI-SR, which specializes in processing large-scale RSI by capturing long-range dependency with linear complexity. To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR, to explore the spatial and frequent correlations. In particular, our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM) to grasp their merits for effective spatial-frequency fusion. Recognizing that global and local dependencies are complementary and both beneficial for SR, we further recalibrate these multi-level features for accurate feature fusion via learnable scaling adaptors. Extensive experiments on AID, DOTA, and DIOR benchmarks demonstrate that our FMSR outperforms state-of-the-art Transformer-based methods HAT-L in terms of PSNR by 0.11 dB on average, while consuming only 28.05% and 19.08% of its memory consumption and complexity, respectively.
Abstract:Recent efforts have witnessed remarkable progress in Satellite Video Super-Resolution (SVSR). However, most SVSR methods usually assume the degradation is fixed and known, e.g., bicubic downsampling, which makes them vulnerable in real-world scenes with multiple and unknown degradations. To alleviate this issue, blind SR has thus become a research hotspot. Nevertheless, existing approaches are mainly engaged in blur kernel estimation while losing sight of another critical aspect for VSR tasks: temporal compensation, especially compensating for blurry and smooth pixels with vital sharpness from severely degraded satellite videos. Therefore, this paper proposes a practical Blind SVSR algorithm (BSVSR) to explore more sharp cues by considering the pixel-wise blur levels in a coarse-to-fine manner. Specifically, we employed multi-scale deformable convolution to coarsely aggregate the temporal redundancy into adjacent frames by window-slid progressive fusion. Then the adjacent features are finely merged into mid-feature using deformable attention, which measures the blur levels of pixels and assigns more weights to the informative pixels, thus inspiring the representation of sharpness. Moreover, we devise a pyramid spatial transformation module to adjust the solution space of sharp mid-feature, resulting in flexible feature adaptation in multi-level domains. Quantitative and qualitative evaluations on both simulated and real-world satellite videos demonstrate that our BSVSR performs favorably against state-of-the-art non-blind and blind SR models. Code will be available at https://github.com/XY-boy/Blind-Satellite-VSR
Abstract:Hyperspectral images play a crucial role in precision agriculture, environmental monitoring or ecological analysis. However, due to sensor equipment and the imaging environment, the observed hyperspectral images are often inevitably corrupted by various noise. In this study, we proposed a truncated diffusion model, called TDiffDe, to recover the useful information in hyperspectral images gradually. Rather than starting from a pure noise, the input data contains image information in hyperspectral image denoising. Thus, we cut the trained diffusion model from small steps to avoid the destroy of valid information.
Abstract:Recently, convolutional networks have achieved remarkable development in remote sensing image Super-Resoltuion (SR) by minimizing the regression objectives, e.g., MSE loss. However, despite achieving impressive performance, these methods often suffer from poor visual quality with over-smooth issues. Generative adversarial networks have the potential to infer intricate details, but they are easy to collapse, resulting in undesirable artifacts. To mitigate these issues, in this paper, we first introduce Diffusion Probabilistic Model (DPM) for efficient remote sensing image SR, dubbed EDiffSR. EDiffSR is easy to train and maintains the merits of DPM in generating perceptual-pleasant images. Specifically, different from previous works using heavy UNet for noise prediction, we develop an Efficient Activation Network (EANet) to achieve favorable noise prediction performance by simplified channel attention and simple gate operation, which dramatically reduces the computational budget. Moreover, to introduce more valuable prior knowledge into the proposed EDiffSR, a practical Conditional Prior Enhancement Module (CPEM) is developed to help extract an enriched condition. Unlike most DPM-based SR models that directly generate conditions by amplifying LR images, the proposed CPEM helps to retain more informative cues for accurate SR. Extensive experiments on four remote sensing datasets demonstrate that EDiffSR can restore visual-pleasant images on simulated and real-world remote sensing images, both quantitatively and qualitatively. The code of EDiffSR will be available at https://github.com/XY-boy/EDiffSR
Abstract:Optical-flow-based and kernel-based approaches have been widely explored for temporal compensation in satellite video super-resolution (VSR). However, these techniques involve high computational consumption and are prone to fail under complex motions. In this paper, we proposed to exploit the well-defined temporal difference for efficient and robust temporal compensation. To fully utilize the temporal information within frames, we separately modeled the short-term and long-term temporal discrepancy since they provide distinctive complementary properties. Specifically, a short-term temporal difference module is designed to extract local motion representations from residual maps between adjacent frames, which provides more clues for accurate texture representation. Meanwhile, the global dependency in the entire frame sequence is explored via long-term difference learning. The differences between forward and backward segments are incorporated and activated to modulate the temporal feature, resulting in holistic global compensation. Besides, we further proposed a difference compensation unit to enrich the interaction between the spatial distribution of the target frame and compensated results, which helps maintain spatial consistency while refining the features to avoid misalignment. Extensive objective and subjective evaluation of five mainstream satellite videos demonstrates that the proposed method performs favorably for satellite VSR. Code will be available at \url{https://github.com/XY-boy/TDMVSR}
Abstract:In the fields of image restoration and image fusion, model-driven methods and data-driven methods are the two representative frameworks. However, both approaches have their respective advantages and disadvantages. The model-driven methods consider the imaging mechanism, which is deterministic and theoretically reasonable; however, they cannot easily model complicated nonlinear problems. The data-driven methods have a stronger prior knowledge learning capability for huge data, especially for nonlinear statistical features; however, the interpretability of the networks is poor, and they are over-dependent on training data. In this paper, we systematically investigate the coupling of model-driven and data-driven methods, which has rarely been considered in the remote sensing image restoration and fusion communities. We are the first to summarize the coupling approaches into the following three categories: 1) data-driven and model-driven cascading methods; 2) variational models with embedded learning; and 3) model-constrained network learning methods. The typical existing and potential coupling methods for remote sensing image restoration and fusion are introduced with application examples. This paper also gives some new insights into the potential future directions, in terms of both methods and applications.
Abstract:The data fusion technology aims to aggregate the characteristics of different data and obtain products with multiple data advantages. To solves the problem of reduced resolution of PolSAR images due to system limitations, we propose a fully polarimetric synthetic aperture radar (PolSAR) images and single-polarization synthetic aperture radar SAR (SinSAR) images fusion network to generate high-resolution PolSAR (HR-PolSAR) images. To take advantage of the polarimetric information of the low-resolution PolSAR (LR-PolSAR) image and the spatial information of the high-resolution single-polarization SAR (HR-SinSAR) image, we propose a fusion framework for joint LR-PolSAR image and HR-SinSAR image and design a cross-attention mechanism to extract features from the joint input data. Besides, based on the physical imaging mechanism, we designed the PolSAR polarimetric loss function for constrained network training. The experimental results confirm the superiority of fusion network over traditional algorithms. The average PSNR is increased by more than 3.6db, and the average MAE is reduced to less than 0.07. Experiments on polarimetric decomposition and polarimetric signature show that it maintains polarimetric information well.
Abstract:Hyperspectral images are crucial for many research works. Spectral super-resolution (SSR) is a method used to obtain high spatial resolution (HR) hyperspectral images from HR multispectral images. Traditional SSR methods include model-driven algorithms and deep learning. By unfolding a variational method, this paper proposes an optimization-driven convolutional neural network (CNN) with a deep spatial-spectral prior, resulting in physically interpretable networks. Unlike the fully data-driven CNN, auxiliary spectral response function (SRF) is utilized to guide CNNs to group the bands with spectral relevance. In addition, the channel attention module (CAM) and reformulated spectral angle mapper loss function are applied to achieve an effective reconstruction model. Finally, experiments on two types of datasets, including natural and remote sensing images, demonstrate the spectral enhancement effect of the proposed method. And the classification results on the remote sensing dataset also verified the validity of the information enhanced by the proposed method.
Abstract:The existence of hybrid noise in hyperspectral images (HSIs) severely degrades the data quality, reduces the interpretation accuracy of HSIs, and restricts the subsequent HSIs applications. In this paper, the spatial-spectral gradient network (SSGN) is presented for mixed noise removal in HSIs. The proposed method employs a spatial-spectral gradient learning strategy, in consideration of the unique spatial structure directionality of sparse noise and spectral differences with additional complementary information for better extracting intrinsic and deep features of HSIs. Based on a fully cascaded multi-scale convolutional network, SSGN can simultaneously deal with the different types of noise in different HSIs or spectra by the use of the same model. The simulated and real-data experiments undertaken in this study confirmed that the proposed SSGN performs better at mixed noise removal than the other state-of-the-art HSI denoising algorithms, in evaluation indices, visual assessments, and time consumption.
Abstract:Destriping is a classical problem in remote sensing image processing. Although considerable effort has been made to remove stripes, few of the existing methods can eliminate stripe noise with arbitrary orientations. This situation makes the removal of oblique stripes in the higher-level remote sensing products become an unfinished and urgent issue. To overcome the challenging problem, we propose a novel destriping model which is self-adjusted to different orientations of stripe noise. First of all, the oriented variation model is designed to accomplish the stripe orientation approximation. In this model, the stripe direction is automatically estimated and then imbedded into the constraint term to depict the along-stripe smoothness of the stripe component. Mainly based on the oriented variation model, a whole destriping framework is proposed by jointly employing an L1-norm constraint and a TV regularization to separately capture the global distribution property of stripe component and the piecewise smoothness of the clean image. The qualitative and quantitative experimental results of both orientation and destriping aspects confirm the effectiveness and stability of the proposed method.