Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andrea Gemelli

BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations

Jan 06, 2025

Simone Giovannini, Fabio Coppini, Andrea Gemelli, Simone Marinai

Figure 1 for BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations

Figure 2 for BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations

Figure 3 for BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations

Figure 4 for BoundingDocs: a Unified Dataset for Document Question Answering with Spatial Annotations

Abstract:We present a unified dataset for document Question-Answering (QA), which is obtained combining several public datasets related to Document AI and visually rich document understanding (VRDU). Our main contribution is twofold: on the one hand we reformulate existing Document AI tasks, such as Information Extraction (IE), into a Question-Answering task, making it a suitable resource for training and evaluating Large Language Models; on the other hand, we release the OCR of all the documents and include the exact position of the answer to be found in the document image as a bounding box. Using this dataset, we explore the impact of different prompting techniques (that might include bounding box information) on the performance of open-weight models, identifying the most effective approaches for document comprehension.

Via

Access Paper or Ask Questions

CTE: A Dataset for Contextualized Table Extraction

Feb 13, 2023

Andrea Gemelli, Emanuele Vivoli, Simone Marinai

Figure 1 for CTE: A Dataset for Contextualized Table Extraction

Figure 2 for CTE: A Dataset for Contextualized Table Extraction

Figure 3 for CTE: A Dataset for Contextualized Table Extraction

Figure 4 for CTE: A Dataset for Contextualized Table Extraction

Abstract:Relevant information in documents is often summarized in tables, helping the reader to identify useful facts. Most benchmark datasets support either document layout analysis or table understanding, but lack in providing data to apply both tasks in a unified way. We define the task of Contextualized Table Extraction (CTE), which aims to extract and define the structure of tables considering the textual context of the document. The dataset comprises 75k fully annotated pages of scientific papers, including more than 35k tables. Data are gathered from PubMed Central, merging the information provided by annotations in the PubTables-1M and PubLayNet datasets. The dataset can support CTE and adds new classes to the original ones. The generated annotations can be used to develop end-to-end pipelines for various tasks, including document layout analysis, table detection, structure recognition, and functional analysis. We formally define CTE and evaluation metrics, showing which subtasks can be tackled, describing advantages, limitations, and future works of this collection of data. Annotations and code will be accessible a https://github.com/AILab-UniFI/cte-dataset.

Via

Access Paper or Ask Questions

Data augmentation on graphs for table type classification

Aug 23, 2022

Davide del Bimbo, Andrea Gemelli, Simone Marinai

Figure 1 for Data augmentation on graphs for table type classification

Figure 2 for Data augmentation on graphs for table type classification

Figure 3 for Data augmentation on graphs for table type classification

Figure 4 for Data augmentation on graphs for table type classification

Abstract:Tables are widely used in documents because of their compact and structured representation of information. In particular, in scientific papers, tables can sum up novel discoveries and summarize experimental results, making the research comparable and easily understandable by scholars. Since the layout of tables is highly variable, it would be useful to interpret their content and classify them into categories. This could be helpful to directly extract information from scientific papers, for instance comparing performance of some models given their paper result tables. In this work, we address the classification of tables using a Graph Neural Network, exploiting the table structure for the message passing algorithm in use. We evaluate our model on a subset of the Tab2Know dataset. Since it contains few examples manually annotated, we propose data augmentation techniques directly on the table graph structures. We achieve promising preliminary results, proposing a data augmentation method suitable for graph-based table representation.

* S+SSPR 2022

Via

Access Paper or Ask Questions

Graph Neural Networks and Representation Embedding for Table Extraction in PDF Documents

Aug 23, 2022

Andrea Gemelli, Emanuele Vivoli, Simone Marinai

Figure 1 for Graph Neural Networks and Representation Embedding for Table Extraction in PDF Documents

Figure 2 for Graph Neural Networks and Representation Embedding for Table Extraction in PDF Documents

Figure 3 for Graph Neural Networks and Representation Embedding for Table Extraction in PDF Documents

Figure 4 for Graph Neural Networks and Representation Embedding for Table Extraction in PDF Documents

Abstract:Tables are widely used in several types of documents since they can bring important information in a structured way. In scientific papers, tables can sum up novel discoveries and summarize experimental results, making the research comparable and easily understandable by scholars. Several methods perform table analysis working on document images, losing useful information during the conversion from the PDF files since OCR tools can be prone to recognition errors, in particular for text inside tables. The main contribution of this work is to tackle the problem of table extraction, exploiting Graph Neural Networks. Node features are enriched with suitably designed representation embeddings. These representations help to better distinguish not only tables from the other parts of the paper, but also table cells from table headers. We experimentally evaluated the proposed approach on a new dataset obtained by merging the information provided in the PubLayNet and PubTables-1M datasets.

* ICPR 2022

Via

Access Paper or Ask Questions

Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks

Aug 23, 2022

Andrea Gemelli, Sanket Biswas, Enrico Civitelli, Josep Lladós, Simone Marinai

Figure 1 for Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks

Figure 2 for Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks

Figure 3 for Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks

Figure 4 for Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks

Abstract:Geometric Deep Learning has recently attracted significant interest in a wide range of machine learning fields, including document analysis. The application of Graph Neural Networks (GNNs) has become crucial in various document-related tasks since they can unravel important structural patterns, fundamental in key information extraction processes. Previous works in the literature propose task-driven models and do not take into account the full power of graphs. We propose Doc2Graph, a task-agnostic document understanding framework based on a GNN model, to solve different tasks given different types of documents. We evaluated our approach on two challenging datasets for key information extraction in form understanding, invoice layout analysis and table detection. Our code is freely accessible on https://github.com/andreagemelli/doc2graph.

* TiE ECCV 2022

Via

Access Paper or Ask Questions