Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Platypus: A Generalized Specialist Model for Reading Text in Various Forms

Aug 27, 2024

Peng Wang, Zhaohai Li, Jun Tang, Humen Zhong, Fei Huang, Zhibo Yang, Cong Yao

Figure 1 for Platypus: A Generalized Specialist Model for Reading Text in Various Forms

Figure 2 for Platypus: A Generalized Specialist Model for Reading Text in Various Forms

Figure 3 for Platypus: A Generalized Specialist Model for Reading Text in Various Forms

Figure 4 for Platypus: A Generalized Specialist Model for Reading Text in Various Forms

Share this with someone who'll enjoy it:

Abstract:Reading text from images (either natural scenes or documents) has been a long-standing research topic for decades, due to the high technical challenge and wide application range. Previously, individual specialist models are developed to tackle the sub-tasks of text reading (e.g., scene text recognition, handwritten text recognition and mathematical expression recognition). However, such specialist models usually cannot effectively generalize across different sub-tasks. Recently, generalist models (such as GPT-4V), trained on tremendous data in a unified way, have shown enormous potential in reading text in various scenarios, but with the drawbacks of limited accuracy and low efficiency. In this work, we propose Platypus, a generalized specialist model for text reading. Specifically, Platypus combines the best of both worlds: being able to recognize text of various forms with a single unified architecture, while achieving excellent accuracy and high efficiency. To better exploit the advantage of Platypus, we also construct a text reading dataset (called Worms), the images of which are curated from previous datasets and partially re-labeled. Experiments on standard benchmarks demonstrate the effectiveness and superiority of the proposed Platypus model. Model and data will be made publicly available at https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/OCR/Platypus.

* Accepted by ECCV2024

View paper on

Share this with someone who'll enjoy it:

Title:Platypus: A Generalized Specialist Model for Reading Text in Various Forms

Paper and Code