Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images

Dec 13, 2022

Hongkuan Zhang, Edward Whittaker, Ikuo Kitagishi

Figure 1 for Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images

Figure 2 for Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images

Figure 3 for Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images

Figure 4 for Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images

Share this with someone who'll enjoy it:

Abstract:Digitization of scanned receipts aims to extract text from receipt images and save it into structured documents. This is usually split into two sub-tasks: text localization and optical character recognition (OCR). Most existing OCR models only focus on the cropped text instance images, which require the bounding box information provided by a text region detection model. Introducing an additional detector to identify the text instance images in advance is inefficient, however instance-level OCR models have very low accuracy when processing the whole image for the document-level OCR, such as receipt images containing multiple text lines arranged in various layouts. To this end, we propose a localization-free document-level OCR model for transcribing all the characters in a receipt image into an ordered sequence end-to-end. Specifically, we finetune the pretrained Transformer-based instance-level model TrOCR with randomly cropped image chunks, and gradually increase the image chunk size to generalize the recognition ability from instance images to full-page images. In our experiments on the SROIE receipt OCR dataset, the model finetuned with our strategy achieved 64.4 F1-score and a 22.8% character error rates (CER) on the word-level and character-level metrics, respectively, which outperforms the baseline results with 48.5 F1-score and 50.6% CER. The best model, which splits the full image into 15 equally sized chunks, gives 87.8 F1-score and 4.98% CER with minimal additional pre or post-processing of the output. Moreover, the characters in the generated document-level sequences are arranged in the reading order, which is practical for real-world applications.

View paper on

Share this with someone who'll enjoy it:

Title:Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images

Paper and Code