Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fengjun Guo

NTIRE 2025 Challenge on Image Super-Resolution ($\times$4): Methods and Results

Apr 20, 2025

Zheng Chen, Kai Liu, Jue Gong, Jingkai Wang, Lei Sun, Zongwei Wu, Radu Timofte, Yulun Zhang, Xiangyu Kong, Xiaoxuan Yu(+101 more)

Abstract:This paper presents the NTIRE 2025 image super-resolution ($\times$4) challenge, one of the associated competitions of the 10th NTIRE Workshop at CVPR 2025. The challenge aims to recover high-resolution (HR) images from low-resolution (LR) counterparts generated through bicubic downsampling with a $\times$4 scaling factor. The objective is to develop effective network designs or solutions that achieve state-of-the-art SR performance. To reflect the dual objectives of image SR research, the challenge includes two sub-tracks: (1) a restoration track, emphasizes pixel-wise accuracy and ranks submissions based on PSNR; (2) a perceptual track, focuses on visual realism and ranks results by a perceptual score. A total of 286 participants registered for the competition, with 25 teams submitting valid entries. This report summarizes the challenge design, datasets, evaluation protocol, the main results, and methods of each team. The challenge serves as a benchmark to advance the state of the art and foster progress in image SR.

* NTIRE 2025 webpage: https://www.cvlai.net/ntire/2025. Code: https://github.com/zhengchen1999/NTIRE2025_ImageSR_x4

Via

Access Paper or Ask Questions

Omni-IML: Towards Unified Image Manipulation Localization

Nov 22, 2024

Chenfan Qu, Yiwu Zhong, Fengjun Guo, Lianwen Jin

Figure 1 for Omni-IML: Towards Unified Image Manipulation Localization

Figure 2 for Omni-IML: Towards Unified Image Manipulation Localization

Figure 3 for Omni-IML: Towards Unified Image Manipulation Localization

Figure 4 for Omni-IML: Towards Unified Image Manipulation Localization

Abstract:Image manipulation can lead to misinterpretation of visual content, posing significant risks to information security. Image Manipulation Localization (IML) has thus received increasing attention. However, existing IML methods rely heavily on task-specific designs, making them perform well only on one target image type but are mostly random guessing on other image types, and even joint training on multiple image types causes significant performance degradation. This hinders the deployment for real applications as it notably increases maintenance costs and the misclassification of image types leads to serious error accumulation. To this end, we propose Omni-IML, the first generalist model to unify diverse IML tasks. Specifically, Omni-IML achieves generalism by adopting the Modal Gate Encoder and the Dynamic Weight Decoder to adaptively determine the optimal encoding modality and the optimal decoder filters for each sample. We additionally propose an Anomaly Enhancement module that enhances the features of tampered regions with box supervision and helps the generalist model to extract common features across different IML tasks. We validate our approach on IML tasks across three major scenarios: natural images, document images, and face images. Without bells and whistles, our Omni-IML achieves state-of-the-art performance on all three tasks with a single unified model, providing valuable strategies and insights for real-world application and future research in generalist image forensics. Our code will be publicly available.

Via

Access Paper or Ask Questions

Generalized Tampered Scene Text Detection in the era of Generative AI

Jul 31, 2024

Chenfan Qu, Yiwu Zhong, Fengjun Guo, Lianwen Jin

Abstract:The rapid advancements of generative AI have fueled the potential of generative text image editing while simultaneously escalating the threat of misinformation spreading. However, existing forensics methods struggle to detect unseen forgery types that they have not been trained on, leaving the development of a model capable of generalized detection of tampered scene text as an unresolved issue. To tackle this, we propose a novel task: open-set tampered scene text detection, which evaluates forensics models on their ability to identify both seen and previously unseen forgery types. We have curated a comprehensive, high-quality dataset, featuring the texts tampered by eight text editing models, to thoroughly assess the open-set generalization capabilities. Further, we introduce a novel and effective pre-training paradigm that subtly alters the texture of selected texts within an image and trains the model to identify these regions. This approach not only mitigates the scarcity of high-quality training data but also enhances models' fine-grained perception and open-set generalization abilities. Additionally, we present DAF, a novel framework that improves open-set generalization by distinguishing between the features of authentic and tampered text, rather than focusing solely on the tampered text's features. Our extensive experiments validate the remarkable efficacy of our methods. For example, our zero-shot performance can even beat the previous state-of-the-art full-shot model by a large margin. Our dataset and code will be open-source.

Via

Access Paper or Ask Questions

SemanticMIM: Marring Masked Image Modeling with Semantics Compression for General Visual Representation

Jun 15, 2024

Yike Yuan, Huanzhang Dou, Fengjun Guo, Xi Li

Figure 1 for SemanticMIM: Marring Masked Image Modeling with Semantics Compression for General Visual Representation

Figure 2 for SemanticMIM: Marring Masked Image Modeling with Semantics Compression for General Visual Representation

Figure 3 for SemanticMIM: Marring Masked Image Modeling with Semantics Compression for General Visual Representation

Figure 4 for SemanticMIM: Marring Masked Image Modeling with Semantics Compression for General Visual Representation

Abstract:This paper represents a neat yet effective framework, named SemanticMIM, to integrate the advantages of masked image modeling (MIM) and contrastive learning (CL) for general visual representation. We conduct a thorough comparative analysis between CL and MIM, revealing that their complementary advantages fundamentally stem from two distinct phases, i.e., compression and reconstruction. Specifically, SemanticMIM leverages a proxy architecture that customizes interaction between image and mask tokens, bridging these two phases to achieve general visual representation with the property of abundant semantic and positional awareness. Through extensive qualitative and quantitative evaluations, we demonstrate that SemanticMIM effectively amalgamates the benefits of CL and MIM, leading to significant enhancement of performance and feature linear separability. SemanticMIM also offers notable interpretability through attention response visualization. Codes are available at https://github.com/yyk-wew/SemanticMIM.

Via

Access Paper or Ask Questions

UPOCR: Towards Unified Pixel-Level OCR Interface

Dec 05, 2023

Dezhi Peng, Zhenhua Yang, Jiaxin Zhang, Chongyu Liu, Yongxin Shi, Kai Ding, Fengjun Guo, Lianwen Jin

Abstract:In recent years, the optical character recognition (OCR) field has been proliferating with plentiful cutting-edge approaches for a wide spectrum of tasks. However, these approaches are task-specifically designed with divergent paradigms, architectures, and training strategies, which significantly increases the complexity of research and maintenance and hinders the fast deployment in applications. To this end, we propose UPOCR, a simple-yet-effective generalist model for Unified Pixel-level OCR interface. Specifically, the UPOCR unifies the paradigm of diverse OCR tasks as image-to-image transformation and the architecture as a vision Transformer (ViT)-based encoder-decoder. Learnable task prompts are introduced to push the general feature representations extracted by the encoder toward task-specific spaces, endowing the decoder with task awareness. Moreover, the model training is uniformly aimed at minimizing the discrepancy between the generated and ground-truth images regardless of the inhomogeneity among tasks. Experiments are conducted on three pixel-level OCR tasks including text removal, text segmentation, and tampered text detection. Without bells and whistles, the experimental results showcase that the proposed method can simultaneously achieve state-of-the-art performance on three tasks with a unified single model, which provides valuable strategies and insights for future research on generalist OCR models. Code will be publicly available.

Via

Access Paper or Ask Questions

Deep Image Harmonization with Globally Guided Feature Transformation and Relation Distillation

Aug 01, 2023

Li Niu, Linfeng Tan, Xinhao Tao, Junyan Cao, Fengjun Guo, Teng Long, Liqing Zhang

Figure 1 for Deep Image Harmonization with Globally Guided Feature Transformation and Relation Distillation

Figure 2 for Deep Image Harmonization with Globally Guided Feature Transformation and Relation Distillation

Figure 3 for Deep Image Harmonization with Globally Guided Feature Transformation and Relation Distillation

Figure 4 for Deep Image Harmonization with Globally Guided Feature Transformation and Relation Distillation

Abstract:Given a composite image, image harmonization aims to adjust the foreground illumination to be consistent with background. Previous methods have explored transforming foreground features to achieve competitive performance. In this work, we show that using global information to guide foreground feature transformation could achieve significant improvement. Besides, we propose to transfer the foreground-background relation from real images to composite images, which can provide intermediate supervision for the transformed encoder features. Additionally, considering the drawbacks of existing harmonization datasets, we also contribute a ccHarmony dataset which simulates the natural illumination variation. Extensive experiments on iHarmony4 and our contributed dataset demonstrate the superiority of our method. Our ccHarmony dataset is released at https://github.com/bcmi/Image-Harmonization-Dataset-ccHarmony.

* Accepted by ICCV 2023

Via

Access Paper or Ask Questions

DocAligner: Annotating Real-world Photographic Document Images by Simply Taking Pictures

Jun 12, 2023

Jiaxin Zhang, Bangdong Chen, Hiuyi Cheng, Fengjun Guo, Kai Ding, Lianwen Jin

Figure 1 for DocAligner: Annotating Real-world Photographic Document Images by Simply Taking Pictures

Figure 2 for DocAligner: Annotating Real-world Photographic Document Images by Simply Taking Pictures

Figure 3 for DocAligner: Annotating Real-world Photographic Document Images by Simply Taking Pictures

Figure 4 for DocAligner: Annotating Real-world Photographic Document Images by Simply Taking Pictures

Abstract:Recently, there has been a growing interest in research concerning document image analysis and recognition in photographic scenarios. However, the lack of labeled datasets for this emerging challenge poses a significant obstacle, as manual annotation can be time-consuming and impractical. To tackle this issue, we present DocAligner, a novel method that streamlines the manual annotation process to a simple step of taking pictures. DocAligner achieves this by establishing dense correspondence between photographic document images and their clean counterparts. It enables the automatic transfer of existing annotations in clean document images to photographic ones and helps to automatically acquire labels that are unavailable through manual labeling. Considering the distinctive characteristics of document images, DocAligner incorporates several innovative features. First, we propose a non-rigid pre-alignment technique based on the document's edges, which effectively eliminates interference caused by significant global shifts and repetitive patterns present in document images. Second, to handle large shifts and ensure high accuracy, we introduce a hierarchical aligning approach that combines global and local correlation layers. Furthermore, considering the importance of fine-grained elements in document images, we present a details recurrent refinement module to enhance the output in a high-resolution space. To train DocAligner, we construct a synthetic dataset and introduce a self-supervised learning approach to enhance its robustness for real-world data. Through extensive experiments, we demonstrate the effectiveness of DocAligner and the acquired dataset. Datasets and codes will be publicly available.

Via

Access Paper or Ask Questions

Inharmonious Region Localization by Magnifying Domain Discrepancy

Sep 30, 2022

Jing Liang, Li Niu, Penghao Wu, Fengjun Guo, Teng Long

Figure 1 for Inharmonious Region Localization by Magnifying Domain Discrepancy

Figure 2 for Inharmonious Region Localization by Magnifying Domain Discrepancy

Figure 3 for Inharmonious Region Localization by Magnifying Domain Discrepancy

Figure 4 for Inharmonious Region Localization by Magnifying Domain Discrepancy

Abstract:Inharmonious region localization aims to localize the region in a synthetic image which is incompatible with surrounding background. The inharmony issue is mainly attributed to the color and illumination inconsistency produced by image editing techniques. In this work, we tend to transform the input image to another color space to magnify the domain discrepancy between inharmonious region and background, so that the model can identify the inharmonious region more easily. To this end, we present a novel framework consisting of a color mapping module and an inharmonious region localization network, in which the former is equipped with a novel domain discrepancy magnification loss and the latter could be an arbitrary localization network. Extensive experiments on image harmonization dataset show the superiority of our designed framework. Our code is available at https://github.com/bcmi/MadisNet-Inharmonious-Region-Localization.

Via

Access Paper or Ask Questions

Marior: Margin Removal and Iterative Content Rectification for Document Dewarping in the Wild

Jul 23, 2022

Jiaxin Zhang, Canjie Luo, Lianwen Jin, Fengjun Guo, Kai Ding

Figure 1 for Marior: Margin Removal and Iterative Content Rectification for Document Dewarping in the Wild

Figure 2 for Marior: Margin Removal and Iterative Content Rectification for Document Dewarping in the Wild

Figure 3 for Marior: Margin Removal and Iterative Content Rectification for Document Dewarping in the Wild

Figure 4 for Marior: Margin Removal and Iterative Content Rectification for Document Dewarping in the Wild

Abstract:Camera-captured document images usually suffer from perspective and geometric deformations. It is of great value to rectify them when considering poor visual aesthetics and the deteriorated performance of OCR systems. Recent learning-based methods intensively focus on the accurately cropped document image. However, this might not be sufficient for overcoming practical challenges, including document images either with large marginal regions or without margins. Due to this impracticality, users struggle to crop documents precisely when they encounter large marginal regions. Simultaneously, dewarping images without margins is still an insurmountable problem. To the best of our knowledge, there is still no complete and effective pipeline for rectifying document images in the wild. To address this issue, we propose a novel approach called Marior (Margin Removal and \Iterative Content Rectification). Marior follows a progressive strategy to iteratively improve the dewarping quality and readability in a coarse-to-fine manner. Specifically, we divide the pipeline into two modules: margin removal module (MRM) and iterative content rectification module (ICRM). First, we predict the segmentation mask of the input image to remove the margin, thereby obtaining a preliminary result. Then we refine the image further by producing dense displacement flows to achieve content-aware rectification. We determine the number of refinement iterations adaptively. Experiments demonstrate the state-of-the-art performance of our method on public benchmarks. The resources are available at https://github.com/ZZZHANG-jx/Marior for further comparison.

* This paper has been accepted by ACM Multimedia 2022

Via

Access Paper or Ask Questions

Don't Forget Me: Accurate Background Recovery for Text Removal via Modeling Local-Global Context

Jul 21, 2022

Chongyu Liu, Lianwen Jin, Yuliang Liu, Canjie Luo, Bangdong Chen, Fengjun Guo, Kai Ding

Figure 1 for Don't Forget Me: Accurate Background Recovery for Text Removal via Modeling Local-Global Context

Figure 2 for Don't Forget Me: Accurate Background Recovery for Text Removal via Modeling Local-Global Context

Figure 3 for Don't Forget Me: Accurate Background Recovery for Text Removal via Modeling Local-Global Context

Figure 4 for Don't Forget Me: Accurate Background Recovery for Text Removal via Modeling Local-Global Context

Abstract:Text removal has attracted increasingly attention due to its various applications on privacy protection, document restoration, and text editing. It has shown significant progress with deep neural network. However, most of the existing methods often generate inconsistent results for complex background. To address this issue, we propose a Contextual-guided Text Removal Network, termed as CTRNet. CTRNet explores both low-level structure and high-level discriminative context feature as prior knowledge to guide the process of background restoration. We further propose a Local-global Content Modeling (LGCM) block with CNNs and Transformer-Encoder to capture local features and establish the long-term relationship among pixels globally. Finally, we incorporate LGCM with context guidance for feature modeling and decoding. Experiments on benchmark datasets, SCUT-EnsText and SCUT-Syn show that CTRNet significantly outperforms the existing state-of-the-art methods. Furthermore, a qualitative experiment on examination papers also demonstrates the generalization ability of our method. The codes and supplement materials are available at https://github.com/lcy0604/CTRNet.

Via

Access Paper or Ask Questions