Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Christian Reisswig

BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding

Oct 14, 2019

Timo I. Denk, Christian Reisswig

Figure 1 for BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding

Figure 2 for BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding

Figure 3 for BERTgrid: Contextualized Embedding for 2D Document Representation and Understanding

Abstract:For understanding generic documents, information like font sizes, column layout, and generally the positioning of words may carry semantic information that is crucial for solving a downstream document intelligence task. Our novel BERTgrid, which is based on Chargrid by Katti et al. (2018), represents a document as a grid of contextualized word piece embedding vectors, thereby making its spatial structure and semantics accessible to the processing neural network. The contextualized embedding vectors are retrieved from a BERT language model. We use BERTgrid in combination with a fully convolutional network on a semantic instance segmentation task for extracting fields from invoices. We demonstrate its performance on tabulated line item and document header field extraction.

* 4 pages, accepted at the "Document Intelligence" workshop of 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

Via

Access Paper or Ask Questions

Chargrid-OCR: End-to-end trainable Optical Character Recognition through Semantic Segmentation and Object Detection

Sep 13, 2019

Christian Reisswig, Anoop R Katti, Marco Spinaci, Johannes Höhne

Figure 1 for Chargrid-OCR: End-to-end trainable Optical Character Recognition through Semantic Segmentation and Object Detection

Figure 2 for Chargrid-OCR: End-to-end trainable Optical Character Recognition through Semantic Segmentation and Object Detection

Figure 3 for Chargrid-OCR: End-to-end trainable Optical Character Recognition through Semantic Segmentation and Object Detection

Abstract:We present an end-to-end trainable approach for optical character recognition (OCR) on printed documents. It is based on predicting a two-dimensional character grid (\emph{chargrid}) representation of a document image as a semantic segmentation task. To identify individual character instances from the chargrid, we regard characters as objects and use object detection techniques from computer vision. We demonstrate experimentally that our method outperforms previous state-of-the-art approaches in accuracy while being easily parallelizable on GPU (therefore being significantly faster), as well as easier to train.

* 4 pages

Via

Access Paper or Ask Questions

Chargrid: Towards Understanding 2D Documents

Sep 24, 2018

Anoop Raveendra Katti, Christian Reisswig, Cordula Guder, Sebastian Brarda, Steffen Bickel, Johannes Höhne, Jean Baptiste Faddoul

Figure 1 for Chargrid: Towards Understanding 2D Documents

Figure 2 for Chargrid: Towards Understanding 2D Documents

Figure 3 for Chargrid: Towards Understanding 2D Documents

Figure 4 for Chargrid: Towards Understanding 2D Documents

Abstract:We introduce a novel type of text representation that preserves the 2D layout of a document. This is achieved by encoding each document page as a two-dimensional grid of characters. Based on this representation, we present a generic document understanding pipeline for structured documents. This pipeline makes use of a fully convolutional encoder-decoder network that predicts a segmentation mask and bounding boxes. We demonstrate its capabilities on an information extraction task from invoices and show that it significantly outperforms approaches based on sequential text or document images.

* To be published at EMNLP 2018

Via

Access Paper or Ask Questions