Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rémi Juge

Table-Of-Contents generation on contemporary documents

Nov 20, 2019

Najah-Imane Bentabet, Rémi Juge, Sira Ferradans

Figure 1 for Table-Of-Contents generation on contemporary documents

Figure 2 for Table-Of-Contents generation on contemporary documents

Figure 3 for Table-Of-Contents generation on contemporary documents

Figure 4 for Table-Of-Contents generation on contemporary documents

Abstract:The generation of precise and detailed Table-Of-Contents (TOC) from a document is a problem of major importance for document understanding and information extraction. Despite its importance, it is still a challenging task, especially for non-standardized documents with rich layout information such as commercial documents. In this paper, we present a new neural-based pipeline for TOC generation applicable to any searchable document. Unlike previous methods, we do not use semantic labeling nor assume the presence of parsable TOC pages in the document. Moreover, we analyze the influence of using external knowledge encoded as a template. We empirically show that this approach is only useful in a very low resource environment. Finally, we propose a new domain-specific data set that sheds some light on the difficulties of TOC generation in real-world documents. The proposed method shows better performance than the state-of-the-art on a public data set and on the newly released data set.

* ICDAR 2019 Main Conference paper

Via

Access Paper or Ask Questions