Abstract:Denoising Diffusion Probabilistic Models (DDPMs) have gained great attention in adversarial purification. Current diffusion-based works focus on designing effective condition-guided mechanisms while ignoring a fundamental problem, i.e., the original DDPM sampling is intended for stable generation, which may not be the optimal solution for adversarial purification. Inspired by the stability of the Denoising Diffusion Implicit Model (DDIM), we propose an opposite sampling scheme called random sampling. In brief, random sampling will sample from a random noisy space during each diffusion process, while DDPM and DDIM sampling will continuously sample from the adjacent or original noisy space. Thus, random sampling obtains more randomness and achieves stronger robustness against adversarial attacks. Correspondingly, we also introduce a novel mediator conditional guidance to guarantee the consistency of the prediction under the purified image and clean image input. To expand awareness of guided diffusion purification, we conduct a detailed evaluation with different sampling methods and our random sampling achieves an impressive improvement in multiple settings. Leveraging mediator-guided random sampling, we also establish a baseline method named DiffAP, which significantly outperforms state-of-the-art (SOTA) approaches in performance and defensive stability. Remarkably, under strong attack, our DiffAP even achieves a more than 20% robustness advantage with 10$\times$ sampling acceleration.
Abstract:Snapshot Compressive Imaging (SCI) relies on decoding algorithms such as CNN or Transformer to reconstruct the hyperspectral image (HSI) from its compressed measurement. Although existing CNN and Transformer-based methods have proven effective, CNNs are limited by their inadequate modeling of long-range dependencies, while Transformer ones face high computational costs due to quadratic complexity. Recent Mamba models have demonstrated superior performance over CNN and Transformer-based architectures in some visual tasks, but these models have not fully utilized the local similarities in both spatial and spectral dimensions. Moreover, the long-sequence modeling capability of SSM may offer an advantage in processing the numerous spectral bands for HSI reconstruction, which has not yet been explored. In this paper, we introduce a State Space Model with Across-Scanning and Local Enhancement, named ASLE-SSM, that employs a Spatial-Spectral SSM for global-local balanced context encoding and cross-channel interaction promoting. Specifically, we introduce local scanning in the spatial dimension to balance the global and local receptive fields, and then propose our across-scanning method based on spatial-spectral local cubes to leverage local similarities between adjacent spectral bands and pixels to guide the reconstruction process. These two scanning mechanisms extract the HSI's local features while balancing the global perspective without any additional costs. Experimental results illustrate ASLE-SSM's superiority over existing state-of-the-art methods, with an inference speed 2.4 times faster than Transformer-based MST and saving 0.12 (M) of parameters, achieving the lowest computational cost and parameter count.