Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yangliu Xu

MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining

Jun 01, 2022

Pengyuan Lyu, Chengquan Zhang, Shanshan Liu, Meina Qiao, Yangliu Xu, Liang Wu, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang

Figure 1 for MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining

Figure 2 for MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining

Figure 3 for MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining

Figure 4 for MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining

Abstract:In this paper, we present a model pretraining technique, named MaskOCR, for text recognition. Our text recognition architecture is an encoder-decoder transformer: the encoder extracts the patch-level representations, and the decoder recognizes the text from the representations. Our approach pretrains both the encoder and the decoder in a sequential manner. (i) We pretrain the encoder in a self-supervised manner over a large set of unlabeled real text images. We adopt the masked image modeling approach, which shows the effectiveness for general images, expecting that the representations take on semantics. (ii) We pretrain the decoder over a large set of synthesized text images in a supervised manner and enhance the language modeling capability of the decoder by randomly masking some text image patches occupied by characters input to the encoder and accordingly the representations input to the decoder. Experiments show that the proposed MaskOCR approach achieves superior results on the benchmark datasets, including Chinese and English text images.

Via

Access Paper or Ask Questions

AON: Towards Arbitrarily-Oriented Text Recognition

Mar 22, 2018

Zhanzhan Cheng, Yangliu Xu, Fan Bai, Yi Niu, Shiliang Pu, Shuigeng Zhou

Figure 1 for AON: Towards Arbitrarily-Oriented Text Recognition

Figure 2 for AON: Towards Arbitrarily-Oriented Text Recognition

Figure 3 for AON: Towards Arbitrarily-Oriented Text Recognition

Figure 4 for AON: Towards Arbitrarily-Oriented Text Recognition

Abstract:Recognizing text from natural images is a hot research topic in computer vision due to its various applications. Despite the enduring research of several decades on optical character recognition (OCR), recognizing texts from natural images is still a challenging task. This is because scene texts are often in irregular (e.g. curved, arbitrarily-oriented or seriously distorted) arrangements, which have not yet been well addressed in the literature. Existing methods on text recognition mainly work with regular (horizontal and frontal) texts and cannot be trivially generalized to handle irregular texts. In this paper, we develop the arbitrary orientation network (AON) to directly capture the deep features of irregular texts, which are combined into an attention-based decoder to generate character sequence. The whole network can be trained end-to-end by using only images and word-level annotations. Extensive experiments on various benchmarks, including the CUTE80, SVT-Perspective, IIIT5k, SVT and ICDAR datasets, show that the proposed AON-based method achieves the-state-of-the-art performance in irregular datasets, and is comparable to major existing methods in regular datasets.

* Accepted by CVPR2018

Via

Access Paper or Ask Questions