Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qiaosi Yi

Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach

Dec 04, 2024

Lingchen Sun, Rongyuan Wu, Zhiyuan Ma, Shuaizheng Liu, Qiaosi Yi, Lei Zhang

Figure 1 for Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach

Figure 2 for Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach

Figure 3 for Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach

Figure 4 for Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach

Abstract:Diffusion prior-based methods have shown impressive results in real-world image super-resolution (SR). However, most existing methods entangle pixel-level and semantic-level SR objectives in the training process, struggling to balance pixel-wise fidelity and perceptual quality. Meanwhile, users have varying preferences on SR results, thus it is demanded to develop an adjustable SR model that can be tailored to different fidelity-perception preferences during inference without re-training. We present Pixel-level and Semantic-level Adjustable SR (PiSA-SR), which learns two LoRA modules upon the pre-trained stable-diffusion (SD) model to achieve improved and adjustable SR results. We first formulate the SD-based SR problem as learning the residual between the low-quality input and the high-quality output, then show that the learning objective can be decoupled into two distinct LoRA weight spaces: one is characterized by the $\ell_2$-loss for pixel-level regression, and another is characterized by the LPIPS and classifier score distillation losses to extract semantic information from pre-trained classification and SD models. In its default setting, PiSA-SR can be performed in a single diffusion step, achieving leading real-world SR results in both quality and efficiency. By introducing two adjustable guidance scales on the two LoRA modules to control the strengths of pixel-wise fidelity and semantic-level details during inference, PiSASR can offer flexible SR results according to user preference without re-training. Codes and models can be found at https://github.com/csslc/PiSA-SR.

Via

Access Paper or Ask Questions

Multi-Scale Representation Learning for Image Restoration with State-Space Model

Aug 19, 2024

Yuhong He, Long Peng, Qiaosi Yi, Chen Wu, Lu Wang

Figure 1 for Multi-Scale Representation Learning for Image Restoration with State-Space Model

Figure 2 for Multi-Scale Representation Learning for Image Restoration with State-Space Model

Figure 3 for Multi-Scale Representation Learning for Image Restoration with State-Space Model

Figure 4 for Multi-Scale Representation Learning for Image Restoration with State-Space Model

Abstract:Image restoration endeavors to reconstruct a high-quality, detail-rich image from a degraded counterpart, which is a pivotal process in photography and various computer vision systems. In real-world scenarios, different types of degradation can cause the loss of image details at various scales and degrade image contrast. Existing methods predominantly rely on CNN and Transformer to capture multi-scale representations. However, these methods are often limited by the high computational complexity of Transformers and the constrained receptive field of CNN, which hinder them from achieving superior performance and efficiency in image restoration. To address these challenges, we propose a novel Multi-Scale State-Space Model-based (MS-Mamba) for efficient image restoration that enhances the capacity for multi-scale representation learning through our proposed global and regional SSM modules. Additionally, an Adaptive Gradient Block (AGB) and a Residual Fourier Block (RFB) are proposed to improve the network's detail extraction capabilities by capturing gradients in various directions and facilitating learning details in the frequency domain. Extensive experiments on nine public benchmarks across four classic image restoration tasks, image deraining, dehazing, denoising, and low-light enhancement, demonstrate that our proposed method achieves new state-of-the-art performance while maintaining low computational complexity. The source code will be publicly available.

Via

Access Paper or Ask Questions

NTIRE 2024 Restore Any Image Model in the Wild Challenge

May 16, 2024

Jie Liang, Radu Timofte, Qiaosi Yi, Shuaizheng Liu, Lingchen Sun, Rongyuan Wu, Xindong Zhang, Hui Zeng, Lei Zhang

Figure 1 for NTIRE 2024 Restore Any Image Model in the Wild Challenge

Figure 2 for NTIRE 2024 Restore Any Image Model in the Wild Challenge

Figure 3 for NTIRE 2024 Restore Any Image Model in the Wild Challenge

Figure 4 for NTIRE 2024 Restore Any Image Model in the Wild Challenge

Abstract:In this paper, we review the NTIRE 2024 challenge on Restore Any Image Model (RAIM) in the Wild. The RAIM challenge constructed a benchmark for image restoration in the wild, including real-world images with/without reference ground truth in various scenarios from real applications. The participants were required to restore the real-captured images from complex and unknown degradation, where generative perceptual quality and fidelity are desired in the restoration result. The challenge consisted of two tasks. Task one employed real referenced data pairs, where quantitative evaluation is available. Task two used unpaired images, and a comprehensive user study was conducted. The challenge attracted more than 200 registrations, where 39 of them submitted results with more than 400 submissions. Top-ranked methods improved the state-of-the-art restoration performance and obtained unanimous recognition from all 18 judges. The proposed datasets are available at https://drive.google.com/file/d/1DqbxUoiUqkAIkExu3jZAqoElr_nu1IXb/view?usp=sharing and the homepage of this challenge is at https://codalab.lisn.upsaclay.fr/competitions/17632.

Via

Access Paper or Ask Questions

Textual Prompt Guided Image Restoration

Dec 11, 2023

Qiuhai Yan, Aiwen Jiang, Kang Chen, Long Peng, Qiaosi Yi, Chunjie Zhang

Abstract:Image restoration has always been a cutting-edge topic in the academic and industrial fields of computer vision. Since degradation signals are often random and diverse, "all-in-one" models that can do blind image restoration have been concerned in recent years. Early works require training specialized headers and tails to handle each degradation of concern, which are manually cumbersome. Recent works focus on learning visual prompts from data distribution to identify degradation type. However, the prompts employed in most of models are non-text, lacking sufficient emphasis on the importance of human-in-the-loop. In this paper, an effective textual prompt guided image restoration model has been proposed. In this model, task-specific BERT is fine-tuned to accurately understand user's instructions and generating textual prompt guidance. Depth-wise multi-head transposed attentions and gated convolution modules are designed to bridge the gap between textual prompts and visual features. The proposed model has innovatively introduced semantic prompts into low-level visual domain. It highlights the potential to provide a natural, precise, and controllable way to perform image restoration tasks. Extensive experiments have been done on public denoising, dehazing and deraining datasets. The experiment results demonstrate that, compared with popular state-of-the-art methods, the proposed model can obtain much more superior performance, achieving accurate recognition and removal of degradation without increasing model's complexity. Related source codes and data will be publicly available on github site https://github.com/MoTong-AI-studio/TextPromptIR.

* 12 pages, 10figures

Via

Access Paper or Ask Questions

Reducing Capacity Gap in Knowledge Distillation with Review Mechanism for Crowd Counting

Jun 11, 2022

Yunxin Liu, Qiaosi Yi, Jinshan Zeng

Figure 1 for Reducing Capacity Gap in Knowledge Distillation with Review Mechanism for Crowd Counting

Figure 2 for Reducing Capacity Gap in Knowledge Distillation with Review Mechanism for Crowd Counting

Figure 3 for Reducing Capacity Gap in Knowledge Distillation with Review Mechanism for Crowd Counting

Figure 4 for Reducing Capacity Gap in Knowledge Distillation with Review Mechanism for Crowd Counting

Abstract:The lightweight crowd counting models, in particular knowledge distillation (KD) based models, have attracted rising attention in recent years due to their superiority on computational efficiency and hardware requirement. However, existing KD based models usually suffer from the capacity gap issue, resulting in the performance of the student network being limited by the teacher network. In this paper, we address this issue by introducing a novel review mechanism following KD models, motivated by the review mechanism of human-beings during the study. Thus, the proposed model is dubbed ReviewKD. The proposed model consists of an instruction phase and a review phase, where we firstly exploit a well-trained heavy teacher network to transfer its latent feature to a lightweight student network in the instruction phase, then in the review phase yield a refined estimate of the density map based on the learned feature through a review mechanism. The effectiveness of ReviewKD is demonstrated by a set of experiments over six benchmark datasets via comparing to the state-of-the-art models. Numerical results show that ReviewKD outperforms existing lightweight models for crowd counting, and can effectively alleviate the capacity gap issue, and particularly has the performance beyond the teacher network. Besides the lightweight models, we also show that the suggested review mechanism can be used as a plug-and-play module to further boost the performance of a kind of heavy crowd counting models without modifying the neural network architecture and introducing any additional model parameter.

Via

Access Paper or Ask Questions

Multiple Degradation and Reconstruction Network for Single Image Denoising via Knowledge Distillation

Apr 29, 2022

Juncheng Li, Hanhui Yang, Qiaosi Yi, Faming Fang, Guangwei Gao, Tieyong Zeng, Guixu Zhang

Figure 1 for Multiple Degradation and Reconstruction Network for Single Image Denoising via Knowledge Distillation

Figure 2 for Multiple Degradation and Reconstruction Network for Single Image Denoising via Knowledge Distillation

Figure 3 for Multiple Degradation and Reconstruction Network for Single Image Denoising via Knowledge Distillation

Figure 4 for Multiple Degradation and Reconstruction Network for Single Image Denoising via Knowledge Distillation

Abstract:Single image denoising (SID) has achieved significant breakthroughs with the development of deep learning. However, the proposed methods are often accompanied by plenty of parameters, which greatly limits their application scenarios. Different from previous works that blindly increase the depth of the network, we explore the degradation mechanism of the noisy image and propose a lightweight Multiple Degradation and Reconstruction Network (MDRN) to progressively remove noise. Meanwhile, we propose two novel Heterogeneous Knowledge Distillation Strategies (HMDS) to enable MDRN to learn richer and more accurate features from heterogeneous models, which make it possible to reconstruct higher-quality denoised images under extreme conditions. Extensive experiments show that our MDRN achieves favorable performance against other SID models with fewer parameters. Meanwhile, plenty of ablation studies demonstrate that the introduced HMDS can improve the performance of tiny models or the model under high noise levels, which is extremely useful for related applications.

* Accepted by CVPR Workshop 2022

Via

Access Paper or Ask Questions

Contrastive Learning for Local and Global Learning MRI Reconstruction

Nov 30, 2021

Qiaosi Yi, Jinhao Liu, Le Hu, Faming Fang, Guixu Zhang

Figure 1 for Contrastive Learning for Local and Global Learning MRI Reconstruction

Figure 2 for Contrastive Learning for Local and Global Learning MRI Reconstruction

Figure 3 for Contrastive Learning for Local and Global Learning MRI Reconstruction

Figure 4 for Contrastive Learning for Local and Global Learning MRI Reconstruction

Abstract:Magnetic Resonance Imaging (MRI) is an important medical imaging modality, while it requires a long acquisition time. To reduce the acquisition time, various methods have been proposed. However, these methods failed to reconstruct images with a clear structure for two main reasons. Firstly, similar patches widely exist in MR images, while most previous deep learning-based methods ignore this property and only adopt CNN to learn local information. Secondly, the existing methods only use clear images to constrain the upper bound of the solution space, while the lower bound is not constrained, so that a better parameter of the network cannot be obtained. To address these problems, we propose a Contrastive Learning for Local and Global Learning MRI Reconstruction Network (CLGNet). Specifically, according to the Fourier theory, each value in the Fourier domain is calculated from all the values in Spatial domain. Therefore, we propose a Spatial and Fourier Layer (SFL) to simultaneously learn the local and global information in Spatial and Fourier domains. Moreover, compared with self-attention and transformer, the SFL has a stronger learning ability and can achieve better performance in less time. Based on the SFL, we design a Spatial and Fourier Residual block as the main component of our model. Meanwhile, to constrain the lower bound and upper bound of the solution space, we introduce contrastive learning, which can pull the result closer to the clear image and push the result further away from the undersampled image. Extensive experimental results on different datasets and acceleration rates demonstrate that the proposed CLGNet achieves new state-of-the-art results.

Via

Access Paper or Ask Questions

Structure-Preserving Deraining with Residue Channel Prior Guidance

Aug 20, 2021

Qiaosi Yi, Juncheng Li, Qinyan Dai, Faming Fang, Guixu Zhang, Tieyong Zeng

Figure 1 for Structure-Preserving Deraining with Residue Channel Prior Guidance

Figure 2 for Structure-Preserving Deraining with Residue Channel Prior Guidance

Figure 3 for Structure-Preserving Deraining with Residue Channel Prior Guidance

Figure 4 for Structure-Preserving Deraining with Residue Channel Prior Guidance

Abstract:Single image deraining is important for many high-level computer vision tasks since the rain streaks can severely degrade the visibility of images, thereby affecting the recognition and analysis of the image. Recently, many CNN-based methods have been proposed for rain removal. Although these methods can remove part of the rain streaks, it is difficult for them to adapt to real-world scenarios and restore high-quality rain-free images with clear and accurate structures. To solve this problem, we propose a Structure-Preserving Deraining Network (SPDNet) with RCP guidance. SPDNet directly generates high-quality rain-free images with clear and accurate structures under the guidance of RCP but does not rely on any rain-generating assumptions. Specifically, we found that the RCP of images contains more accurate structural information than rainy images. Therefore, we introduced it to our deraining network to protect structure information of the rain-free image. Meanwhile, a Wavelet-based Multi-Level Module (WMLM) is proposed as the backbone for learning the background information of rainy images and an Interactive Fusion Module (IFM) is designed to make full use of RCP information. In addition, an iterative guidance strategy is proposed to gradually improve the accuracy of RCP, refining the result in a progressive path. Extensive experimental results on both synthetic and real-world datasets demonstrate that the proposed model achieves new state-of-the-art results. Code: https://github.com/Joyies/SPDNet

Via

Access Paper or Ask Questions

Feedback Network for Mutually Boosted Stereo Image Super-Resolution and Disparity Estimation

Jun 02, 2021

Qinyan Dai, Juncheng Li, Qiaosi Yi, Faming Fang, Guixu Zhang

Figure 1 for Feedback Network for Mutually Boosted Stereo Image Super-Resolution and Disparity Estimation

Figure 2 for Feedback Network for Mutually Boosted Stereo Image Super-Resolution and Disparity Estimation

Figure 3 for Feedback Network for Mutually Boosted Stereo Image Super-Resolution and Disparity Estimation

Figure 4 for Feedback Network for Mutually Boosted Stereo Image Super-Resolution and Disparity Estimation

Abstract:Under stereo settings, the problem of image super-resolution (SR) and disparity estimation are interrelated that the result of each problem could help to solve the other. The effective exploitation of correspondence between different views facilitates the SR performance, while the high-resolution (HR) features with richer details benefit the correspondence estimation. According to this motivation, we propose a Stereo Super-Resolution and Disparity Estimation Feedback Network (SSRDE-FNet), which simultaneously handles the stereo image super-resolution and disparity estimation in a unified framework and interact them with each other to further improve their performance. Specifically, the SSRDE-FNet is composed of two dual recursive sub-networks for left and right views. Besides the cross-view information exploitation in the low-resolution (LR) space, HR representations produced by the SR process are utilized to perform HR disparity estimation with higher accuracy, through which the HR features can be aggregated to generate a finer SR result. Afterward, the proposed HR Disparity Information Feedback (HRDIF) mechanism delivers information carried by HR disparity back to previous layers to further refine the SR image reconstruction. Extensive experiments demonstrate the effectiveness and advancement of SSRDE-FNet.

Via

Access Paper or Ask Questions

Efficient and Accurate Multi-scale Topological Network for Single Image Dehazing

Feb 24, 2021

Qiaosi Yi, Juncheng Li, Faming Fang, Aiwen Jiang, Guixu Zhang

Figure 1 for Efficient and Accurate Multi-scale Topological Network for Single Image Dehazing

Figure 2 for Efficient and Accurate Multi-scale Topological Network for Single Image Dehazing

Figure 3 for Efficient and Accurate Multi-scale Topological Network for Single Image Dehazing

Figure 4 for Efficient and Accurate Multi-scale Topological Network for Single Image Dehazing

Abstract:Single image dehazing is a challenging ill-posed problem that has drawn significant attention in the last few years. Recently, convolutional neural networks have achieved great success in image dehazing. However, it is still difficult for these increasingly complex models to recover accurate details from the hazy image. In this paper, we pay attention to the feature extraction and utilization of the input image itself. To achieve this, we propose a Multi-scale Topological Network (MSTN) to fully explore the features at different scales. Meanwhile, we design a Multi-scale Feature Fusion Module (MFFM) and an Adaptive Feature Selection Module (AFSM) to achieve the selection and fusion of features at different scales, so as to achieve progressive image dehazing. This topological network provides a large number of search paths that enable the network to extract abundant image features as well as strong fault tolerance and robustness. In addition, ASFM and MFFM can adaptively select important features and ignore interference information when fusing different scale representations. Extensive experiments are conducted to demonstrate the superiority of our method compared with state-of-the-art methods.

Via

Access Paper or Ask Questions