Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mohammad Reza Sarshogh

A Multitask Network for Localization and Recognition of Text in Images

Jun 21, 2019

Mohammad Reza Sarshogh, Keegan E. Hines

Figure 1 for A Multitask Network for Localization and Recognition of Text in Images

Figure 2 for A Multitask Network for Localization and Recognition of Text in Images

Figure 3 for A Multitask Network for Localization and Recognition of Text in Images

Figure 4 for A Multitask Network for Localization and Recognition of Text in Images

Abstract:We present an end-to-end trainable multi-task network that addresses the problem of lexicon-free text extraction from complex documents. This network simultaneously solves the problems of text localization and text recognition and text segments are identified with no post-processing, cropping, or word grouping. A convolutional backbone and Feature Pyramid Network are combined to provide a shared representation that benefits each of three model heads: text localization, classification, and text recognition. To improve recognition accuracy, we describe a dynamic pooling mechanism that retains high-resolution information across all RoIs. For text recognition, we propose a convolutional mechanism with attention which out-performs more common recurrent architectures. Our model is evaluated against benchmark datasets and comparable methods and achieves high performance in challenging regimes of non-traditional OCR.

* ICDAR 2019

Via

Access Paper or Ask Questions