Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lars Klöser

Profiling German Text Simplification with Interpretable Model-Fingerprints

Jan 19, 2026

Lars Klöser, Mika Beele, Bodo Kraft

Abstract:While Large Language Models (LLMs) produce highly nuanced text simplifications, developers currently lack tools for a holistic, efficient, and reproducible diagnosis of their behavior. This paper introduces the Simplification Profiler, a diagnostic toolkit that generates a multidimensional, interpretable fingerprint of simplified texts. Multiple aggregated simplifications of a model result in a model's fingerprint. This novel evaluation paradigm is particularly vital for languages, where the data scarcity problem is magnified when creating flexible models for diverse target groups rather than a single, fixed simplification style. We propose that measuring a model's unique behavioral signature is more relevant in this context as an alternative to correlating metrics with human preferences. We operationalize this with a practical meta-evaluation of our fingerprints' descriptive power, which bypasses the need for large, human-rated datasets. This test measures if a simple linear classifier can reliably identify various model configurations by their created simplifications, confirming that our metrics are sensitive to a model's specific characteristics. The Profiler can distinguish high-level behavioral variations between prompting strategies and fine-grained changes from prompt engineering, including few-shot examples. Our complete feature set achieves classification F1-scores up to 71.9 %, improving upon simple baselines by over 48 percentage points. The Simplification Profiler thus offers developers a granular, actionable analysis to build more effective and truly adaptive text simplification systems.

* Presented at 2nd International Conference on Explainable AI for Neural and Symbolic Systems

Via

Access Paper or Ask Questions

German Text Simplification: Finetuning Large Language Models with Semi-Synthetic Data

Feb 16, 2024

Lars Klöser, Mika Beele, Jan-Niklas Schagen, Bodo Kraft

Abstract:This study pioneers the use of synthetically generated data for training generative models in document-level text simplification of German texts. We demonstrate the effectiveness of our approach with real-world online texts. Addressing the challenge of data scarcity in language simplification, we crawled professionally simplified German texts and synthesized a corpus using GPT-4. We finetune Large Language Models with up to 13 billion parameters on this data and evaluate their performance. This paper employs various methodologies for evaluation and demonstrates the limitations of currently used rule-based metrics. Both automatic and manual evaluations reveal that our models can significantly simplify real-world online texts, indicating the potential of synthetic data in improving text simplification.

* Accepted at Fourth Workshop on Language Technology for Equality, Diversity, Inclusion - EACL 2024

Via

Access Paper or Ask Questions

Explaining Relation Classification Models with Semantic Extents

Aug 04, 2023

Lars Klöser, Andre Büsgen, Philipp Kohl, Bodo Kraft, Albert Zündorf

Abstract:In recent years, the development of large pretrained language models, such as BERT and GPT, significantly improved information extraction systems on various tasks, including relation classification. State-of-the-art systems are highly accurate on scientific benchmarks. A lack of explainability is currently a complicating factor in many real-world applications. Comprehensible systems are necessary to prevent biased, counterintuitive, or harmful decisions. We introduce semantic extents, a concept to analyze decision patterns for the relation classification task. Semantic extents are the most influential parts of texts concerning classification decisions. Our definition allows similar procedures to determine semantic extents for humans and models. We provide an annotation tool and a software framework to determine semantic extents for humans and models conveniently and reproducibly. Comparing both reveals that models tend to learn shortcut patterns from data. These patterns are hard to detect with current interpretability methods, such as input reductions. Our approach can help detect and eliminate spurious decision patterns during model development. Semantic extents can increase the reliability and security of natural language processing systems. Semantic extents are an essential step in enabling applications in critical areas like healthcare or finance. Moreover, our work opens new research directions for developing methods to explain deep learning models.

* Accepted at DeLTA 2023: Deep Learning Theory and Applications conference

Via

Access Paper or Ask Questions

Multi-Attribute Relation Extraction (MARE) -- Simplifying the Application of Relation Extraction

Nov 17, 2021

Lars Klöser, Philipp Kohl, Bodo Kraft, Albert Zündorf

Figure 1 for Multi-Attribute Relation Extraction (MARE) -- Simplifying the Application of Relation Extraction

Figure 2 for Multi-Attribute Relation Extraction (MARE) -- Simplifying the Application of Relation Extraction

Figure 3 for Multi-Attribute Relation Extraction (MARE) -- Simplifying the Application of Relation Extraction

Figure 4 for Multi-Attribute Relation Extraction (MARE) -- Simplifying the Application of Relation Extraction

Abstract:Natural language understanding's relation extraction makes innovative and encouraging novel business concepts possible and facilitates new digitilized decision-making processes. Current approaches allow the extraction of relations with a fixed number of entities as attributes. Extracting relations with an arbitrary amount of attributes requires complex systems and costly relation-trigger annotations to assist these systems. We introduce multi-attribute relation extraction (MARE) as an assumption-less problem formulation with two approaches, facilitating an explicit mapping from business use cases to the data annotations. Avoiding elaborated annotation constraints simplifies the application of relation extraction approaches. The evaluation compares our models to current state-of-the-art event extraction and binary relation extraction methods. Our approaches show improvement compared to these on the extraction of general multi-attribute relations.

* Proceedings of the 2nd International Conference on Deep Learning Theory and Applications, Vol. 1, (2021), P. 148 - 156
* Preprint of short paper for the 2nd International Conference on Deep Learning Theory and Applications (2021)

Via

Access Paper or Ask Questions

STAMP 4 NLP -- An Agile Framework for Rapid Quality-Driven NLP Applications Development

Nov 16, 2021

Philipp Kohl, Oliver Schmidts, Lars Klöser, Henri Werth, Bodo Kraft, Albert Zündorf

Figure 1 for STAMP 4 NLP -- An Agile Framework for Rapid Quality-Driven NLP Applications Development

Figure 2 for STAMP 4 NLP -- An Agile Framework for Rapid Quality-Driven NLP Applications Development

Figure 3 for STAMP 4 NLP -- An Agile Framework for Rapid Quality-Driven NLP Applications Development

Abstract:The progress in natural language processing (NLP) research over the last years, offers novel business opportunities for companies, as automated user interaction or improved data analysis. Building sophisticated NLP applications requires dealing with modern machine learning (ML) technologies, which impedes enterprises from establishing successful NLP projects. Our experience in applied NLP research projects shows that the continuous integration of research prototypes in production-like environments with quality assurance builds trust in the software and shows convenience and usefulness regarding the business goal. We introduce STAMP 4 NLP as an iterative and incremental process model for developing NLP applications. With STAMP 4 NLP, we merge software engineering principles with best practices from data science. Instantiating our process model allows efficiently creating prototypes by utilizing templates, conventions, and implementations, enabling developers and data scientists to focus on the business goals. Due to our iterative-incremental approach, businesses can deploy an enhanced version of the prototype to their software environment after every iteration, maximizing potential business value and trust early and avoiding the cost of successful yet never deployed experiments.

* Quality of Information and Communications Technology, 2021, p. 156-166
* Preprint of short paper for QUATIC 2021 conference

Via

Access Paper or Ask Questions