Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zineddine Bettouche

Synthetic Data Augmentation for Table Detection: Re-evaluating TableNet's Performance with Automatically Generated Document Images

Jun 17, 2025

Krishna Sahukara, Zineddine Bettouche, Andreas Fischer

Abstract:Document pages captured by smartphones or scanners often contain tables, yet manual extraction is slow and error-prone. We introduce an automated LaTeX-based pipeline that synthesizes realistic two-column pages with visually diverse table layouts and aligned ground-truth masks. The generated corpus augments the real-world Marmot benchmark and enables a systematic resolution study of TableNet. Training TableNet on our synthetic data achieves a pixel-wise XOR error of 4.04% on our synthetic test set with a 256x256 input resolution, and 4.33% with 1024x1024. The best performance on the Marmot benchmark is 9.18% (at 256x256), while cutting manual annotation effort through automation.

Via

Access Paper or Ask Questions

Contextual Categorization Enhancement through LLMs Latent-Space

Apr 25, 2024

Zineddine Bettouche, Anas Safi, Andreas Fischer

Abstract:Managing the semantic quality of the categorization in large textual datasets, such as Wikipedia, presents significant challenges in terms of complexity and cost. In this paper, we propose leveraging transformer models to distill semantic information from texts in the Wikipedia dataset and its associated categories into a latent space. We then explore different approaches based on these encodings to assess and enhance the semantic identity of the categories. Our graphical approach is powered by Convex Hull, while we utilize Hierarchical Navigable Small Worlds (HNSWs) for the hierarchical approach. As a solution to the information loss caused by the dimensionality reduction, we modulate the following mathematical solution: an exponential decay function driven by the Euclidean distances between the high-dimensional encodings of the textual categories. This function represents a filter built around a contextual category and retrieves items with a certain Reconsideration Probability (RP). Retrieving high-RP items serves as a tool for database administrators to improve data groupings by providing recommendations and identifying outliers within a contextual framework.

* Fifteenth International Conference on Computational Logics, Algebras, Programming, Tools, and Benchmarking (COMPUTATION TOOLS 2024), ISSN: 2308-4170

Via

Access Paper or Ask Questions

Improving Image Tracing with Convolutional Autoencoders by High-Pass Filter Preprocessing

Jun 15, 2023

Zineddine Bettouche, Andreas Fischer

Abstract:The process of transforming a raster image into a vector representation is known as image tracing. This study looks into several processing methods that include high-pass filtering, autoencoding, and vectorization to extract an abstract representation of an image. According to the findings, rebuilding an image with autoencoders, high-pass filtering it, and then vectorizing it can represent the image more abstractly while increasing the effectiveness of the vectorization process.

* IARIA Journal on Advances in Software, ISSN: 1942-2628, vol. 15, pp. 141-151, 2022

Via

Access Paper or Ask Questions

Mapping Researcher Activity based on Publication Data by means of Transformers

Jun 15, 2023

Zineddine Bettouche, Andreas Fischer

Abstract:Modern performance on several natural language processing (NLP) tasks has been enhanced thanks to the Transformer-based pre-trained language model BERT. We employ this concept to investigate a local publication database. Research papers are encoded and clustered to form a landscape view of the scientific topics, in which research is active. Authors working on similar topics can be identified by calculating the similarity between their papers. Based on this, we define a similarity metric between authors. Additionally we introduce the concept of self-similarity to indicate the topical variety of authors.

* Proc. of the Interdisciplinary Conference on Mechanics, Computers and Electrics (ICMECE 2022)

Via

Access Paper or Ask Questions