Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

jin Ma

Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text Image Super-Resolution

Sep 16, 2023

Wenyu Zhang, Xin Deng, Baojun Jia, Xingtong Yu, Yifan Chen, jin Ma, Qing Ding, Xinming Zhang

Figure 1 for Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text Image Super-Resolution

Figure 2 for Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text Image Super-Resolution

Figure 3 for Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text Image Super-Resolution

Figure 4 for Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text Image Super-Resolution

Abstract:Current Scene text image super-resolution approaches primarily focus on extracting robust features, acquiring text information, and complex training strategies to generate super-resolution images. However, the upsampling module, which is crucial in the process of converting low-resolution images to high-resolution ones, has received little attention in existing works. To address this issue, we propose the Pixel Adapter Module (PAM) based on graph attention to address pixel distortion caused by upsampling. The PAM effectively captures local structural information by allowing each pixel to interact with its neighbors and update features. Unlike previous graph attention mechanisms, our approach achieves 2-3 orders of magnitude improvement in efficiency and memory utilization by eliminating the dependency on sparse adjacency matrices and introducing a sliding window approach for efficient parallel computation. Additionally, we introduce the MLP-based Sequential Residual Block (MSRB) for robust feature extraction from text images, and a Local Contour Awareness loss ($\mathcal{L}_{lca}$) to enhance the model's perception of details. Comprehensive experiments on TextZoom demonstrate that our proposed method generates high-quality super-resolution images, surpassing existing methods in recognition accuracy. For single-stage and multi-stage strategies, we achieved improvements of 0.7\% and 2.6\%, respectively, increasing the performance from 52.6\% and 53.7\% to 53.3\% and 56.3\%. The code is available at https://github.com/wenyu1009/RTSRN.

* ACM Multimedia 2023

Via

Access Paper or Ask Questions