Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michael Martin

LLM-KG-Bench 3.0: A Compass for SemanticTechnology Capabilities in the Ocean of LLMs

May 19, 2025

Lars-Peter Meyer, Johannes Frey, Desiree Heim, Felix Brei, Claus Stadler, Kurt Junghanns, Michael Martin

Abstract:Current Large Language Models (LLMs) can assist developing program code beside many other things, but can they support working with Knowledge Graphs (KGs) as well? Which LLM is offering the best capabilities in the field of Semantic Web and Knowledge Graph Engineering (KGE)? Is this possible to determine without checking many answers manually? The LLM-KG-Bench framework in Version 3.0 is designed to answer these questions. It consists of an extensible set of tasks for automated evaluation of LLM answers and covers different aspects of working with semantic technologies. In this paper the LLM-KG-Bench framework is presented in Version 3 along with a dataset of prompts, answers and evaluations generated with it and several state-of-the-art LLMs. Significant enhancements have been made to the framework since its initial release, including an updated task API that offers greater flexibility in handling evaluation tasks, revised tasks, and extended support for various open models through the vllm library, among other improvements. A comprehensive dataset has been generated using more than 30 contemporary open and proprietary LLMs, enabling the creation of exemplary model cards that demonstrate the models' capabilities in working with RDF and SPARQL, as well as comparing their performance on Turtle and JSON-LD RDF serialization tasks.

* Peer reviewed publication at ESWC 2025 Resources Track

Via

Access Paper or Ask Questions

Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering

Aug 31, 2023

Lars-Peter Meyer, Johannes Frey, Kurt Junghanns, Felix Brei, Kirill Bulert, Sabine Gründer-Fahrer, Michael Martin

Figure 1 for Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering

Figure 2 for Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering

Figure 3 for Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering

Abstract:As the field of Large Language Models (LLMs) evolves at an accelerated pace, the critical need to assess and monitor their performance emerges. We introduce a benchmarking framework focused on knowledge graph engineering (KGE) accompanied by three challenges addressing syntax and error correction, facts extraction and dataset generation. We show that while being a useful tool, LLMs are yet unfit to assist in knowledge graph generation with zero-shot prompting. Consequently, our LLM-KG-Bench framework provides automatic evaluation and storage of LLM responses as well as statistical data and visualization tools to support tracking of prompt engineering and model performance.

* To be published in SEMANTICS 2023 poster track proceedings. SEMANTICS 2023 EU: 19th International Conference on Semantic Systems, September 20-22, 2023, Leipzig, Germany

Via

Access Paper or Ask Questions

LLM-assisted Knowledge Graph Engineering: Experiments with ChatGPT

Jul 13, 2023

Lars-Peter Meyer, Claus Stadler, Johannes Frey, Norman Radtke, Kurt Junghanns, Roy Meissner, Gordian Dziwis, Kirill Bulert, Michael Martin

Abstract:Knowledge Graphs (KG) provide us with a structured, flexible, transparent, cross-system, and collaborative way of organizing our knowledge and data across various domains in society and industrial as well as scientific disciplines. KGs surpass any other form of representation in terms of effectiveness. However, Knowledge Graph Engineering (KGE) requires in-depth experiences of graph structures, web technologies, existing models and vocabularies, rule sets, logic, as well as best practices. It also demands a significant amount of work. Considering the advancements in large language models (LLMs) and their interfaces and applications in recent years, we have conducted comprehensive experiments with ChatGPT to explore its potential in supporting KGE. In this paper, we present a selection of these experiments and their results to demonstrate how ChatGPT can assist us in the development and management of KGs.

* to appear in conference proceedings of AI-Tomorrow-23, 29.+30.6.2023 in Leipzig, Germany

Via

Access Paper or Ask Questions

The eBible Corpus: Data and Model Benchmarks for Bible Translation for Low-Resource Languages

Apr 19, 2023

Vesa Akerman, David Baines, Damien Daspit, Ulf Hermjakob, Taeho Jang, Colin Leong, Michael Martin, Joel Mathew, Jonathan Robie, Marcus Schwarting

Figure 1 for The eBible Corpus: Data and Model Benchmarks for Bible Translation for Low-Resource Languages

Figure 2 for The eBible Corpus: Data and Model Benchmarks for Bible Translation for Low-Resource Languages

Figure 3 for The eBible Corpus: Data and Model Benchmarks for Bible Translation for Low-Resource Languages

Figure 4 for The eBible Corpus: Data and Model Benchmarks for Bible Translation for Low-Resource Languages

Abstract:Efficiently and accurately translating a corpus into a low-resource language remains a challenge, regardless of the strategies employed, whether manual, automated, or a combination of the two. Many Christian organizations are dedicated to the task of translating the Holy Bible into languages that lack a modern translation. Bible translation (BT) work is currently underway for over 3000 extremely low resource languages. We introduce the eBible corpus: a dataset containing 1009 translations of portions of the Bible with data in 833 different languages across 75 language families. In addition to a BT benchmarking dataset, we introduce model performance benchmarks built on the No Language Left Behind (NLLB) neural machine translation (NMT) models. Finally, we describe several problems specific to the domain of BT and consider how the established data and model benchmarks might be used for future translation efforts. For a BT task trained with NLLB, Austronesian and Trans-New Guinea language families achieve 35.1 and 31.6 BLEU scores respectively, which spurs future innovations for NMT for low-resource languages in Papua New Guinea.

Via

Access Paper or Ask Questions

Retention Is All You Need

Apr 06, 2023

Karishma Mohiuddin, Mirza Ariful Alam, Mirza Mohtashim Alam, Pascal Welke, Michael Martin, Jens Lehmann, Sahar Vahdati

Abstract:Skilled employees are usually seen as the most important pillar of an organization. Despite this, most organizations face high attrition and turnover rates. While several machine learning models have been developed for analyzing attrition and its causal factors, the interpretations of those models remain opaque. In this paper, we propose the HR-DSS approach, which stands for Human Resource Decision Support System, and uses explainable AI for employee attrition problems. The system is designed to assist human resource departments in interpreting the predictions provided by machine learning models. In our experiments, eight machine learning models are employed to provide predictions, and the results achieved by the best-performing model are further processed by the SHAP explainability process. We optimize both the correctness and explanation of the results. Furthermore, using "What-if-analysis", we aim to observe plausible causes for attrition of an individual employee. The results show that by adjusting the specific dominant features of each individual, employee attrition can turn into employee retention through informative business decisions. Reducing attrition is not only a problem for any specific organization but also, in some countries, becomes a significant societal problem that impacts the well-being of both employers and employees.

Via

Access Paper or Ask Questions

Open Data and the Status Quo -- A Fine-Grained Evaluation Framework for Open Data Quality and an Analysis of Open Data portals in Germany

Jun 17, 2021

Lisa Wenige, Claus Stadler, Michael Martin, Richard Figura, Robert Sauter, Christopher W. Frank

Figure 1 for Open Data and the Status Quo -- A Fine-Grained Evaluation Framework for Open Data Quality and an Analysis of Open Data portals in Germany

Figure 2 for Open Data and the Status Quo -- A Fine-Grained Evaluation Framework for Open Data Quality and an Analysis of Open Data portals in Germany

Figure 3 for Open Data and the Status Quo -- A Fine-Grained Evaluation Framework for Open Data Quality and an Analysis of Open Data portals in Germany

Figure 4 for Open Data and the Status Quo -- A Fine-Grained Evaluation Framework for Open Data Quality and an Analysis of Open Data portals in Germany

Abstract:This paper presents a framework for assessing data and metadata quality within Open Data portals. Although a few benchmark frameworks already exist for this purpose, they are not yet detailed enough in both breadth and depth to make valid statements about the actual discoverability and accessibility of publicly available data collections. To address this research gap, we have designed a quality framework that is able to evaluate data quality in Open Data portals on dedicated and fine-grained dimensions, such as interoperability, findability, uniqueness or completeness. Additionally, we propose quality measures that allow for valid assessments regarding cross-portal findability and uniqueness of dataset descriptions. We have validated our novel quality framework for the German Open Data landscape and found out that metadata often still lacks meaningful descriptions and is not yet extensively connected to the Semantic Web.

Via

Access Paper or Ask Questions