Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bill Barrett

PageNet: Page Boundary Extraction in Historical Handwritten Documents

Sep 05, 2017

Chris Tensmeyer, Brian Davis, Curtis Wigington, Iain Lee, Bill Barrett

Figure 1 for PageNet: Page Boundary Extraction in Historical Handwritten Documents

Figure 2 for PageNet: Page Boundary Extraction in Historical Handwritten Documents

Figure 3 for PageNet: Page Boundary Extraction in Historical Handwritten Documents

Figure 4 for PageNet: Page Boundary Extraction in Historical Handwritten Documents

Abstract:When digitizing a document into an image, it is common to include a surrounding border region to visually indicate that the entire document is present in the image. However, this border should be removed prior to automated processing. In this work, we present a deep learning based system, PageNet, which identifies the main page region in an image in order to segment content from both textual and non-textual border noise. In PageNet, a Fully Convolutional Network obtains a pixel-wise segmentation which is post-processed into the output quadrilateral region. We evaluate PageNet on 4 collections of historical handwritten documents and obtain over 94% mean intersection over union on all datasets and approach human performance on 2 of these collections. Additionally, we show that PageNet can segment documents that are overlayed on top of other documents.

* HIP 2017 (in submission)

Via

Access Paper or Ask Questions