Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mandy Toh

TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text

May 12, 2021

Amanpreet Singh, Guan Pang, Mandy Toh, Jing Huang, Wojciech Galuba, Tal Hassner

Figure 1 for TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text

Figure 2 for TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text

Figure 3 for TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text

Figure 4 for TextOCR: Towards large-scale end-to-end reasoning for arbitrary-shaped scene text

Abstract:A crucial component for the scene text based reasoning required for TextVQA and TextCaps datasets involve detecting and recognizing text present in the images using an optical character recognition (OCR) system. The current systems are crippled by the unavailability of ground truth text annotations for these datasets as well as lack of scene text detection and recognition datasets on real images disallowing the progress in the field of OCR and evaluation of scene text based reasoning in isolation from OCR systems. In this work, we propose TextOCR, an arbitrary-shaped scene text detection and recognition with 900k annotated words collected on real images from TextVQA dataset. We show that current state-of-the-art text-recognition (OCR) models fail to perform well on TextOCR and that training on TextOCR helps achieve state-of-the-art performance on multiple other OCR datasets as well. We use a TextOCR trained OCR model to create PixelM4C model which can do scene text based reasoning on an image in an end-to-end fashion, allowing us to revisit several design choices to achieve new state-of-the-art performance on TextVQA dataset.

* To appear in CVPR 2021. 15 pages, 7 figures. First three authors contributed equally. More info at https://textvqa.org/textocr

Via

Access Paper or Ask Questions

A Multiplexed Network for End-to-End, Multilingual OCR

Mar 29, 2021

Jing Huang, Guan Pang, Rama Kovvuri, Mandy Toh, Kevin J Liang, Praveen Krishnan, Xi Yin, Tal Hassner

Figure 1 for A Multiplexed Network for End-to-End, Multilingual OCR

Figure 2 for A Multiplexed Network for End-to-End, Multilingual OCR

Figure 3 for A Multiplexed Network for End-to-End, Multilingual OCR

Figure 4 for A Multiplexed Network for End-to-End, Multilingual OCR

Abstract:Recent advances in OCR have shown that an end-to-end (E2E) training pipeline that includes both detection and recognition leads to the best results. However, many existing methods focus primarily on Latin-alphabet languages, often even only case-insensitive English characters. In this paper, we propose an E2E approach, Multiplexed Multilingual Mask TextSpotter, that performs script identification at the word level and handles different scripts with different recognition heads, all while maintaining a unified loss that simultaneously optimizes script identification and multiple recognition heads. Experiments show that our method outperforms the single-head model with similar number of parameters in end-to-end recognition tasks, and achieves state-of-the-art results on MLT17 and MLT19 joint text detection and script identification benchmarks. We believe that our work is a step towards the end-to-end trainable and scalable multilingual multi-purpose OCR system. Our code and model will be released.

Via

Access Paper or Ask Questions