Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution

Oct 17, 2024

Junhao Gu, Peng-Tao Jiang, Hao Zhang, Mi Zhou, Jinwei Chen, Wenming Yang, Bo Li

Figure 1 for ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution

Figure 2 for ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution

Figure 3 for ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution

Figure 4 for ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution

Share this with someone who'll enjoy it:

Abstract:Real-world image super-resolution (Real-ISR) aims at restoring high-quality (HQ) images from low-quality (LQ) inputs corrupted by unknown and complex degradations. In particular, pretrained text-to-image (T2I) diffusion models provide strong generative priors to reconstruct credible and intricate details. However, T2I generation focuses on semantic consistency while Real-ISR emphasizes pixel-level reconstruction, which hinders existing methods from fully exploiting diffusion priors. To address this challenge, we introduce ConsisSR to handle both semantic and pixel-level consistency. Specifically, compared to coarse-grained text prompts, we exploit the more powerful CLIP image embedding and effectively leverage both modalities through our Hybrid Prompt Adapter (HPA) for semantic guidance. Secondly, we introduce Time-aware Latent Augmentation (TALA) to mitigate the inherent gap between T2I generation and Real-ISR consistency requirements. By randomly mixing LQ and HQ latent inputs, our model not only handle timestep-specific diffusion noise but also refine the accumulated latent representations. Last but not least, our GAN-Embedding strategy employs the pretrained Real-ESRGAN model to refine the diffusion start point. This accelerates the inference process to 10 steps while preserving sampling quality, in a training-free manner. Our method demonstrates state-of-the-art performance among both full-scale and accelerated models. The code will be made publicly available.

View paper on

Share this with someone who'll enjoy it:

Title:ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution

Paper and Code