Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jinshan Pan

The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

Apr 14, 2025

Bin Ren, Hang Guo, Lei Sun, Zongwei Wu, Radu Timofte, Yawei Li, Yao Zhang, Xinning Chai, Zhengxue Cheng, Yingsheng Qin(+136 more)

Abstract:This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR). The challenge aimed to advance the development of deep models that optimize key computational metrics, i.e., runtime, parameters, and FLOPs, while achieving a PSNR of at least 26.90 dB on the $\operatorname{DIV2K\_LSDIR\_valid}$ dataset and 26.99 dB on the $\operatorname{DIV2K\_LSDIR\_test}$ dataset. A robust participation saw \textbf{244} registered entrants, with \textbf{43} teams submitting valid entries. This report meticulously analyzes these methods and results, emphasizing groundbreaking advancements in state-of-the-art single-image ESR techniques. The analysis highlights innovative approaches and establishes benchmarks for future research in the field.

* Accepted by CVPR2025 NTIRE Workshop, Efficient Super-Resolution Challenge Report. 50 pages

Via

Access Paper or Ask Questions

Devil is in the Uniformity: Exploring Diverse Learners within Transformer for Image Restoration

Mar 26, 2025

Shihao Zhou, Dayu Li, Jinshan Pan, Juncheng Zhou, Jinglei Shi, Jufeng Yang

Abstract:Transformer-based approaches have gained significant attention in image restoration, where the core component, i.e, Multi-Head Attention (MHA), plays a crucial role in capturing diverse features and recovering high-quality results. In MHA, heads perform attention calculation independently from uniform split subspaces, and a redundancy issue is triggered to hinder the model from achieving satisfactory outputs. In this paper, we propose to improve MHA by exploring diverse learners and introducing various interactions between heads, which results in a Hierarchical multI-head atteNtion driven Transformer model, termed HINT, for image restoration. HINT contains two modules, i.e., the Hierarchical Multi-Head Attention (HMHA) and the Query-Key Cache Updating (QKCU) module, to address the redundancy problem that is rooted in vanilla MHA. Specifically, HMHA extracts diverse contextual features by employing heads to learn from subspaces of varying sizes and containing different information. Moreover, QKCU, comprising intra- and inter-layer schemes, further reduces the redundancy problem by facilitating enhanced interactions between attention heads within and across layers. Extensive experiments are conducted on 12 benchmarks across 5 image restoration tasks, including low-light enhancement, dehazing, desnowing, denoising, and deraining, to demonstrate the superiority of HINT. The source code is available in the supplementary materials.

* 11 pages, 10 figures

Via

Access Paper or Ask Questions

Intra and Inter Parser-Prompted Transformers for Effective Image Restoration

Mar 18, 2025

Cong Wang, Jinshan Pan, Liyan Wang, Wei Wang

Abstract:We propose Intra and Inter Parser-Prompted Transformers (PPTformer) that explore useful features from visual foundation models for image restoration. Specifically, PPTformer contains two parts: an Image Restoration Network (IRNet) for restoring images from degraded observations and a Parser-Prompted Feature Generation Network (PPFGNet) for providing IRNet with reliable parser information to boost restoration. To enhance the integration of the parser within IRNet, we propose Intra Parser-Prompted Attention (IntraPPA) and Inter Parser-Prompted Attention (InterPPA) to implicitly and explicitly learn useful parser features to facilitate restoration. The IntraPPA re-considers cross attention between parser and restoration features, enabling implicit perception of the parser from a long-range and intra-layer perspective. Conversely, the InterPPA initially fuses restoration features with those of the parser, followed by formulating these fused features within an attention mechanism to explicitly perceive parser information. Further, we propose a parser-prompted feed-forward network to guide restoration within pixel-wise gating modulation. Experimental results show that PPTformer achieves state-of-the-art performance on image deraining, defocus deblurring, desnowing, and low-light enhancement.

* This version is accepted by the Association for the Advancement of Artificial Intelligence (AAAI-25)

Via

Access Paper or Ask Questions

MFSR: Multi-fractal Feature for Super-resolution Reconstruction with Fine Details Recovery

Feb 27, 2025

Lianping Yang, Peng Jiao, Jinshan Pan, Hegui Zhu, Su Guo

$Figure 1 for MFSR: Multi-fractal Feature for Super-resolution Reconstruction with Fine Details Recovery$

$Figure 2 for MFSR: Multi-fractal Feature for Super-resolution Reconstruction with Fine Details Recovery$

$Figure 3 for MFSR: Multi-fractal Feature for Super-resolution Reconstruction with Fine Details Recovery$

$Figure 4 for MFSR: Multi-fractal Feature for Super-resolution Reconstruction with Fine Details Recovery$

Abstract:In the process of performing image super-resolution processing, the processing of complex localized information can have a significant impact on the quality of the image generated. Fractal features can capture the rich details of both micro and macro texture structures in an image. Therefore, we propose a diffusion model-based super-resolution method incorporating fractal features of low-resolution images, named MFSR. MFSR leverages these fractal features as reinforcement conditions in the denoising process of the diffusion model to ensure accurate recovery of texture information. MFSR employs convolution as a soft assignment to approximate the fractal features of low-resolution images. This approach is also used to approximate the density feature maps of these images. By using soft assignment, the spatial layout of the image is described hierarchically, encoding the self-similarity properties of the image at different scales. Different processing methods are applied to various types of features to enrich the information acquired by the model. In addition, a sub-denoiser is integrated in the denoising U-Net to reduce the noise in the feature maps during the up-sampling process in order to improve the quality of the generated images. Experiments conducted on various face and natural image datasets demonstrate that MFSR can generate higher quality images.

Via

Access Paper or Ask Questions

DeblurDiff: Real-World Image Deblurring with Generative Diffusion Models

Feb 06, 2025

Lingshun Kong, Jiawei Zhang, Dongqing Zou, Jimmy Ren, Xiaohe Wu, Jiangxin Dong, Jinshan Pan

Figure 1 for DeblurDiff: Real-World Image Deblurring with Generative Diffusion Models

Figure 2 for DeblurDiff: Real-World Image Deblurring with Generative Diffusion Models

Figure 3 for DeblurDiff: Real-World Image Deblurring with Generative Diffusion Models

Figure 4 for DeblurDiff: Real-World Image Deblurring with Generative Diffusion Models

Abstract:Diffusion models have achieved significant progress in image generation. The pre-trained Stable Diffusion (SD) models are helpful for image deblurring by providing clear image priors. However, directly using a blurry image or pre-deblurred one as a conditional control for SD will either hinder accurate structure extraction or make the results overly dependent on the deblurring network. In this work, we propose a Latent Kernel Prediction Network (LKPN) to achieve robust real-world image deblurring. Specifically, we co-train the LKPN in latent space with conditional diffusion. The LKPN learns a spatially variant kernel to guide the restoration of sharp images in the latent space. By applying element-wise adaptive convolution (EAC), the learned kernel is utilized to adaptively process the input feature, effectively preserving the structural information of the input. This process thereby more effectively guides the generative process of Stable Diffusion (SD), enhancing both the deblurring efficacy and the quality of detail reconstruction. Moreover, the results at each diffusion step are utilized to iteratively estimate the kernels in LKPN to better restore the sharp latent by EAC. This iterative refinement enhances the accuracy and robustness of the deblurring process. Extensive experimental results demonstrate that the proposed method outperforms state-of-the-art image deblurring methods on both benchmark and real-world images.

Via

Access Paper or Ask Questions

ExpRDiff: Short-exposure Guided Diffusion Model for Realistic Local Motion Deblurring

Dec 12, 2024

Zhongbao Yang, Jiangxin Dong, Jinhui Tang, Jinshan Pan

Figure 1 for ExpRDiff: Short-exposure Guided Diffusion Model for Realistic Local Motion Deblurring

Figure 2 for ExpRDiff: Short-exposure Guided Diffusion Model for Realistic Local Motion Deblurring

Figure 3 for ExpRDiff: Short-exposure Guided Diffusion Model for Realistic Local Motion Deblurring

Figure 4 for ExpRDiff: Short-exposure Guided Diffusion Model for Realistic Local Motion Deblurring

Abstract:Removing blur caused by moving objects is challenging, as the moving objects are usually significantly blurry while the static background remains clear. Existing methods that rely on local blur detection often suffer from inaccuracies and cannot generate satisfactory results when focusing solely on blurred regions. To overcome these problems, we first design a context-based local blur detection module that incorporates additional contextual information to improve the identification of blurry regions. Considering that modern smartphones are equipped with cameras capable of providing short-exposure images, we develop a blur-aware guided image restoration method that utilizes sharp structural details from short-exposure images, facilitating accurate reconstruction of heavily blurred regions. Furthermore, to restore images realistically and visually-pleasant, we develop a short-exposure guided diffusion model that explores useful features from short-exposure images and blurred regions to better constrain the diffusion process. Finally, we formulate the above components into a simple yet effective network, named ExpRDiff. Experimental results show that ExpRDiff performs favorably against state-of-the-art methods.

* Project website: https://github.com/yzb1997/ExpRDiff

Via

Access Paper or Ask Questions

FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration

Dec 02, 2024

Hao Li, Xiang Chen, Jiangxin Dong, Jinhui Tang, Jinshan Pan

Figure 1 for FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration

Figure 2 for FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration

Figure 3 for FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration

Figure 4 for FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration

Abstract:Despite the significant progress made by all-in-one models in universal image restoration, existing methods suffer from a generalization bottleneck in real-world scenarios, as they are mostly trained on small-scale synthetic datasets with limited degradations. Therefore, large-scale high-quality real-world training data is urgently needed to facilitate the emergence of foundational models for image restoration. To advance this field, we spare no effort in contributing a million-scale dataset with two notable advantages over existing training data: real-world samples with larger-scale, and degradation types with higher diversity. By adjusting internal camera settings and external imaging conditions, we can capture aligned image pairs using our well-designed data acquisition system over multiple rounds and our data alignment criterion. Moreover, we propose a robust model, FoundIR, to better address a broader range of restoration tasks in real-world scenarios, taking a further step toward foundation models. Specifically, we first utilize a diffusion-based generalist model to remove degradations by learning the degradation-agnostic common representations from diverse inputs, where incremental learning strategy is adopted to better guide model training. To refine the model's restoration capability in complex scenarios, we introduce degradation-aware specialist models for achieving final high-quality results. Extensive experiments show the value of our dataset and the effectiveness of our method.

* Project website: https://www.foundir.net

Via

Access Paper or Ask Questions

FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolution

Nov 27, 2024

Junyang Chen, Jinshan Pan, Jiangxin Dong

Figure 1 for FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolution

Figure 2 for FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolution

Figure 3 for FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolution

Figure 4 for FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolution

Abstract:Faithful image super-resolution (SR) not only needs to recover images that appear realistic, similar to image generation tasks, but also requires that the restored images maintain fidelity and structural consistency with the input. To this end, we propose a simple and effective method, named FaithDiff, to fully harness the impressive power of latent diffusion models (LDMs) for faithful image SR. In contrast to existing diffusion-based SR methods that freeze the diffusion model pre-trained on high-quality images, we propose to unleash the diffusion prior to identify useful information and recover faithful structures. As there exists a significant gap between the features of degraded inputs and the noisy latent from the diffusion model, we then develop an effective alignment module to explore useful features from degraded inputs to align well with the diffusion process. Considering the indispensable roles and interplay of the encoder and diffusion model in LDMs, we jointly fine-tune them in a unified optimization framework, facilitating the encoder to extract useful features that coincide with diffusion process. Extensive experimental results demonstrate that FaithDiff outperforms state-of-the-art methods, providing high-quality and faithful SR results.

* Project page: https://jychen9811.github.io/FaithDiff_page/

Via

Access Paper or Ask Questions

Super-resolving Real-world Image Illumination Enhancement: A New Dataset and A Conditional Diffusion Model

Oct 16, 2024

Yang Liu, Yaofang Liu, Jinshan Pan, Yuxiang Hui, Fan Jia, Raymond H. Chan, Tieyong Zeng

Figure 1 for Super-resolving Real-world Image Illumination Enhancement: A New Dataset and A Conditional Diffusion Model

Figure 2 for Super-resolving Real-world Image Illumination Enhancement: A New Dataset and A Conditional Diffusion Model

Figure 3 for Super-resolving Real-world Image Illumination Enhancement: A New Dataset and A Conditional Diffusion Model

Figure 4 for Super-resolving Real-world Image Illumination Enhancement: A New Dataset and A Conditional Diffusion Model

Abstract:Most existing super-resolution methods and datasets have been developed to improve the image quality in well-lighted conditions. However, these methods do not work well in real-world low-light conditions as the images captured in such conditions lose most important information and contain significant unknown noises. To solve this problem, we propose a SRRIIE dataset with an efficient conditional diffusion probabilistic models-based method. The proposed dataset contains 4800 paired low-high quality images. To ensure that the dataset are able to model the real-world image degradation in low-illumination environments, we capture images using an ILDC camera and an optical zoom lens with exposure levels ranging from -6 EV to 0 EV and ISO levels ranging from 50 to 12800. We comprehensively evaluate with various reconstruction and perceptual metrics and demonstrate the practicabilities of the SRRIIE dataset for deep learning-based methods. We show that most existing methods are less effective in preserving the structures and sharpness of restored images from complicated noises. To overcome this problem, we revise the condition for Raw sensor data and propose a novel time-melding condition for diffusion probabilistic model. Comprehensive quantitative and qualitative experimental results on the real-world benchmark datasets demonstrate the feasibility and effectivenesses of the proposed conditional diffusion probabilistic model on Raw sensor data. Code and dataset will be available at https://github.com/Yaofang-Liu/Super-Resolving

* Code and dataset at https://github.com/Yaofang-Liu/Super-Resolving

Via

Access Paper or Ask Questions

Cascaded Temporal Updating Network for Efficient Video Super-Resolution

Aug 26, 2024

Hao Li, Jiangxin Dong, Jinshan Pan

Figure 1 for Cascaded Temporal Updating Network for Efficient Video Super-Resolution

Figure 2 for Cascaded Temporal Updating Network for Efficient Video Super-Resolution

Figure 3 for Cascaded Temporal Updating Network for Efficient Video Super-Resolution

Figure 4 for Cascaded Temporal Updating Network for Efficient Video Super-Resolution

Abstract:Existing video super-resolution (VSR) methods generally adopt a recurrent propagation network to extract spatio-temporal information from the entire video sequences, exhibiting impressive performance. However, the key components in recurrent-based VSR networks significantly impact model efficiency, e.g., the alignment module occupies a substantial portion of model parameters, while the bidirectional propagation mechanism significantly amplifies the inference time. Consequently, developing a compact and efficient VSR method that can be deployed on resource-constrained devices, e.g., smartphones, remains challenging. To this end, we propose a cascaded temporal updating network (CTUN) for efficient VSR. We first develop an implicit cascaded alignment module to explore spatio-temporal correspondences from adjacent frames. Moreover, we propose a unidirectional propagation updating network to efficiently explore long-range temporal information, which is crucial for high-quality video reconstruction. Specifically, we develop a simple yet effective hidden updater that can leverage future information to update hidden features during forward propagation, significantly reducing inference time while maintaining performance. Finally, we formulate all of these components into an end-to-end trainable VSR network. Extensive experimental results show that our CTUN achieves a favorable trade-off between efficiency and performance compared to existing methods. Notably, compared with BasicVSR, our method obtains better results while employing only about 30% of the parameters and running time. The source code and pre-trained models will be available at https://github.com/House-Leo/CTUN.

* Project website: https://github.com/House-Leo/CTUN

Via

Access Paper or Ask Questions