Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Johannes Hoffart

Foundation Models for Tabular Data within Systemic Contexts Need Grounding

May 26, 2025

Tassilo Klein, Johannes Hoffart

Abstract:Current research on tabular foundation models often overlooks the complexities of large-scale, real-world data by treating tables as isolated entities and assuming information completeness, thereby neglecting the vital operational context. To address this, we introduce the concept of Semantically Linked Tables (SLT), recognizing that tables are inherently connected to both declarative and procedural operational knowledge. We propose Foundation Models for Semantically Linked Tables (FMSLT), which integrate these components to ground tabular data within its true operational context. This comprehensive representation unlocks the full potential of machine learning for complex, interconnected tabular data across diverse domains. Realizing FMSLTs requires access to operational knowledge that is often unavailable in public datasets, highlighting the need for close collaboration between domain experts and researchers. Our work exposes the limitations of current tabular foundation models and proposes a new direction centered on FMSLTs, aiming to advance robust, context-aware models for structured data.

Via

Access Paper or Ask Questions

SALT: Sales Autocompletion Linked Business Tables Dataset

Jan 06, 2025

Tassilo Klein, Clemens Biehl, Margarida Costa, Andre Sres, Jonas Kolk, Johannes Hoffart

Abstract:Foundation models, particularly those that incorporate Transformer architectures, have demonstrated exceptional performance in domains such as natural language processing and image processing. Adapting these models to structured data, like tables, however, introduces significant challenges. These difficulties are even more pronounced when addressing multi-table data linked via foreign key, which is prevalent in the enterprise realm and crucial for empowering business use cases. Despite its substantial impact, research focusing on such linked business tables within enterprise settings remains a significantly important yet underexplored domain. To address this, we introduce a curated dataset sourced from an Enterprise Resource Planning (ERP) system, featuring extensive linked tables. This dataset is specifically designed to support research endeavors in table representation learning. By providing access to authentic enterprise data, our goal is to potentially enhance the effectiveness and applicability of models for real-world business contexts.

* Table Representation Learning Workshop at NeurIPS 2024

Via

Access Paper or Ask Questions

PORTAL: Scalable Tabular Foundation Models via Content-Specific Tokenization

Oct 17, 2024

Marco Spinaci, Marek Polewczyk, Johannes Hoffart, Markus C. Kohler, Sam Thelin, Tassilo Klein

Figure 1 for PORTAL: Scalable Tabular Foundation Models via Content-Specific Tokenization

Figure 2 for PORTAL: Scalable Tabular Foundation Models via Content-Specific Tokenization

Figure 3 for PORTAL: Scalable Tabular Foundation Models via Content-Specific Tokenization

Figure 4 for PORTAL: Scalable Tabular Foundation Models via Content-Specific Tokenization

Abstract:Self-supervised learning on tabular data seeks to apply advances from natural language and image domains to the diverse domain of tables. However, current techniques often struggle with integrating multi-domain data and require data cleaning or specific structural requirements, limiting the scalability of pre-training datasets. We introduce PORTAL (Pretraining One-Row-at-a-Time for All tabLes), a framework that handles various data modalities without the need for cleaning or preprocessing. This simple yet powerful approach can be effectively pre-trained on online-collected datasets and fine-tuned to match state-of-the-art methods on complex classification and regression tasks. This work offers a practical advancement in self-supervised learning for large-scale tabular data.

* Accepted at Table Representation Learning Workshop at NeurIPS 2024

Via

Access Paper or Ask Questions

Large Process Models: Business Process Management in the Age of Generative AI

Sep 11, 2023

Timotheus Kampik, Christian Warmuth, Adrian Rebmann, Ron Agam, Lukas N. P. Egger, Andreas Gerber, Johannes Hoffart, Jonas Kolk, Philipp Herzig, Gero Decker(+5 more)

Figure 1 for Large Process Models: Business Process Management in the Age of Generative AI

Abstract:The continued success of Large Language Models (LLMs) and other generative artificial intelligence approaches highlights the advantages that large information corpora can have over rigidly defined symbolic models, but also serves as a proof-point of the challenges that purely statistics-based approaches have in terms of safety and trustworthiness. As a framework for contextualizing the potential, as well as the limitations of LLMs and other foundation model-based technologies, we propose the concept of a Large Process Model (LPM) that combines the correlation power of LLMs with the analytical precision and reliability of knowledge-based systems and automated reasoning approaches. LPMs are envisioned to directly utilize the wealth of process management experience that experts have accumulated, as well as process performance data of organizations with diverse characteristics, e.g., regarding size, region, or industry. In this vision, the proposed LPM would allow organizations to receive context-specific (tailored) process and other business models, analytical deep-dives, and improvement recommendations. As such, they would allow to substantially decrease the time and effort required for business transformation, while also allowing for deeper, more impactful, and more actionable insights than previously possible. We argue that implementing an LPM is feasible, but also highlight limitations and research challenges that need to be solved to implement particular aspects of the LPM vision.

Via

Access Paper or Ask Questions

Can Persistent Homology provide an efficient alternative for Evaluation of Knowledge Graph Completion Methods?

Jan 31, 2023

Anson Bastos, Kuldeep Singh, Abhishek Nadgeri, Johannes Hoffart, Toyotaro Suzumura, Manish Singh

Abstract:In this paper we present a novel method, $\textit{Knowledge Persistence}$ ($\mathcal{KP}$), for faster evaluation of Knowledge Graph (KG) completion approaches. Current ranking-based evaluation is quadratic in the size of the KG, leading to long evaluation times and consequently a high carbon footprint. $\mathcal{KP}$ addresses this by representing the topology of the KG completion methods through the lens of topological data analysis, concretely using persistent homology. The characteristics of persistent homology allow $\mathcal{KP}$ to evaluate the quality of the KG completion looking only at a fraction of the data. Experimental results on standard datasets show that the proposed metric is highly correlated with ranking metrics (Hits@N, MR, MRR). Performance evaluation shows that $\mathcal{KP}$ is computationally efficient: In some cases, the evaluation time (validation+test) of a KG completion method has been reduced from 18 hours (using Hits@10) to 27 seconds (using $\mathcal{KP}$), and on average (across methods & data) reduces the evaluation time (validation+test) by $\approx$ $\textbf{99.96}\%$.

* To appear in proceedings of The Web Conference 2023 (WWW'23)

Via

Access Paper or Ask Questions

HopfE: Knowledge Graph Representation Learning using Inverse Hopf Fibrations

Aug 12, 2021

Anson Bastos, Kuldeep Singh, Abhishek Nadgeri, Saeedeh Shekarpour, Isaiah Onando Mulang, Johannes Hoffart

Figure 1 for HopfE: Knowledge Graph Representation Learning using Inverse Hopf Fibrations

Figure 2 for HopfE: Knowledge Graph Representation Learning using Inverse Hopf Fibrations

Figure 3 for HopfE: Knowledge Graph Representation Learning using Inverse Hopf Fibrations

Figure 4 for HopfE: Knowledge Graph Representation Learning using Inverse Hopf Fibrations

Abstract:Recently, several Knowledge Graph Embedding (KGE) approaches have been devised to represent entities and relations in dense vector space and employed in downstream tasks such as link prediction. A few KGE techniques address interpretability, i.e., mapping the connectivity patterns of the relations (i.e., symmetric/asymmetric, inverse, and composition) to a geometric interpretation such as rotations. Other approaches model the representations in higher dimensional space such as four-dimensional space (4D) to enhance the ability to infer the connectivity patterns (i.e., expressiveness). However, modeling relation and entity in a 4D space often comes at the cost of interpretability. This paper proposes HopfE, a novel KGE approach aiming to achieve the interpretability of inferred relations in the four-dimensional space. We first model the structural embeddings in 3D Euclidean space and view the relation operator as an SO(3) rotation. Next, we map the entity embedding vector from a 3D space to a 4D hypersphere using the inverse Hopf Fibration, in which we embed the semantic information from the KG ontology. Thus, HopfE considers the structural and semantic properties of the entities without losing expressivity and interpretability. Our empirical results on four well-known benchmarks achieve state-of-the-art performance for the KG completion task.

* CIKM 2021 : 30th ACM International Conference on Information and Knowledge Management (full paper)

Via

Access Paper or Ask Questions

KGPool: Dynamic Knowledge Graph Context Selection for Relation Extraction

Jun 06, 2021

Abhishek Nadgeri, Anson Bastos, Kuldeep Singh, Isaiah Onando Mulang', Johannes Hoffart, Saeedeh Shekarpour, Vijay Saraswat

Figure 1 for KGPool: Dynamic Knowledge Graph Context Selection for Relation Extraction

Figure 2 for KGPool: Dynamic Knowledge Graph Context Selection for Relation Extraction

Figure 3 for KGPool: Dynamic Knowledge Graph Context Selection for Relation Extraction

Figure 4 for KGPool: Dynamic Knowledge Graph Context Selection for Relation Extraction

Abstract:We present a novel method for relation extraction (RE) from a single sentence, mapping the sentence and two given entities to a canonical fact in a knowledge graph (KG). Especially in this presumed sentential RE setting, the context of a single sentence is often sparse. This paper introduces the KGPool method to address this sparsity, dynamically expanding the context with additional facts from the KG. It learns the representation of these facts (entity alias, entity descriptions, etc.) using neural methods, supplementing the sentential context. Unlike existing methods that statically use all expanded facts, KGPool conditions this expansion on the sentence. We study the efficacy of KGPool by evaluating it with different neural models and KGs (Wikidata and NYT Freebase). Our experimental evaluation on standard datasets shows that by feeding the KGPool representation into a Graph Neural Network, the overall method is significantly more accurate than state-of-the-art methods.

* ACL 2021 (findings)

Via

Access Paper or Ask Questions

CHOLAN: A Modular Approach for Neural Entity Linking on Wikipedia and Wikidata

Feb 08, 2021

Manoj Prabhakar Kannan Ravi, Kuldeep Singh, Isaiah Onando Mulang', Saeedeh Shekarpour, Johannes Hoffart, Jens Lehmann

Figure 1 for CHOLAN: A Modular Approach for Neural Entity Linking on Wikipedia and Wikidata

Figure 2 for CHOLAN: A Modular Approach for Neural Entity Linking on Wikipedia and Wikidata

Figure 3 for CHOLAN: A Modular Approach for Neural Entity Linking on Wikipedia and Wikidata

Figure 4 for CHOLAN: A Modular Approach for Neural Entity Linking on Wikipedia and Wikidata

Abstract:In this paper, we propose CHOLAN, a modular approach to target end-to-end entity linking (EL) over knowledge bases. CHOLAN consists of a pipeline of two transformer-based models integrated sequentially to accomplish the EL task. The first transformer model identifies surface forms (entity mentions) in a given text. For each mention, a second transformer model is employed to classify the target entity among a predefined candidates list. The latter transformer is fed by an enriched context captured from the sentence (i.e. local context), and entity description gained from Wikipedia. Such external contexts have not been used in the state of the art EL approaches. Our empirical study was conducted on two well-known knowledge bases (i.e., Wikidata and Wikipedia). The empirical results suggest that CHOLAN outperforms state-of-the-art approaches on standard datasets such as CoNLL-AIDA, MSNBC, AQUAINT, ACE2004, and T-REx.

* accepted in EACL 2021 (full paper)

Via

Access Paper or Ask Questions

RECON: Relation Extraction using Knowledge Graph Context in a Graph Neural Network

Sep 18, 2020

Anson Bastos, Abhishek Nadgeri, Kuldeep Singh, Isaiah Onando Mulang', Saeedeh Shekarpour, Johannes Hoffart

Figure 1 for RECON: Relation Extraction using Knowledge Graph Context in a Graph Neural Network

Figure 2 for RECON: Relation Extraction using Knowledge Graph Context in a Graph Neural Network

Figure 3 for RECON: Relation Extraction using Knowledge Graph Context in a Graph Neural Network

Figure 4 for RECON: Relation Extraction using Knowledge Graph Context in a Graph Neural Network

Abstract:In this paper, we present a novel method named RECON, that automatically identifies relations in a sentence (sentential relation extraction) and aligns to a knowledge graph (KG). RECON uses a graph neural network to learn representations of both the sentence as well as facts stored in a KG, improving the overall extraction quality. These facts, including entity attributes (label, alias, description, instance-of) and factual triples, have not been collectively used in the state of the art methods. We evaluate the effect of various forms of representing the KG context on the performance of RECON. The empirical evaluation on two standard relation extraction datasets shows that RECON significantly outperforms all state of the art methods on NYT Freebase and Wikidata datasets. RECON reports 87.23 F1 score (Vs 82.29 baseline) on Wikidata dataset whereas on NYT Freebase, reported values are 87.5(P@10) and 74.1(P@30) compared to the previous baseline scores of 81.3(P@10) and 63.1(P@30).

* under review

Via

Access Paper or Ask Questions

Evaluating the Impact of Knowledge Graph Context on Entity Disambiguation Models

Aug 30, 2020

Isaiah Onando Mulang', Kuldeep Singh, Chaitali Prabhu, Abhishek Nadgeri, Johannes Hoffart, Jens Lehmann

Figure 1 for Evaluating the Impact of Knowledge Graph Context on Entity Disambiguation Models

Figure 2 for Evaluating the Impact of Knowledge Graph Context on Entity Disambiguation Models

Figure 3 for Evaluating the Impact of Knowledge Graph Context on Entity Disambiguation Models

Figure 4 for Evaluating the Impact of Knowledge Graph Context on Entity Disambiguation Models

Abstract:Pretrained Transformer models have emerged as state-of-the-art approaches that learn contextual information from text to improve the performance of several NLP tasks. These models, albeit powerful, still require specialized knowledge in specific scenarios. In this paper, we argue that context derived from a knowledge graph (in our case: Wikidata) provides enough signals to inform pretrained transformer models and improve their performance for named entity disambiguation (NED) on Wikidata KG. We further hypothesize that our proposed KG context can be standardized for Wikipedia, and we evaluate the impact of KG context on state-of-the-art NED model for the Wikipedia knowledge base. Our empirical results validate that the proposed KG context can be generalized (for Wikipedia), and providing KG context in transformer architectures considerably outperforms the existing baselines, including the vanilla transformer models.

* CIKM 2020
* to appear in proceedings of CIKM 2020

Via

Access Paper or Ask Questions