Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mika Beele

German Text Simplification: Finetuning Large Language Models with Semi-Synthetic Data

Feb 16, 2024

Lars Klöser, Mika Beele, Jan-Niklas Schagen, Bodo Kraft

Abstract:This study pioneers the use of synthetically generated data for training generative models in document-level text simplification of German texts. We demonstrate the effectiveness of our approach with real-world online texts. Addressing the challenge of data scarcity in language simplification, we crawled professionally simplified German texts and synthesized a corpus using GPT-4. We finetune Large Language Models with up to 13 billion parameters on this data and evaluate their performance. This paper employs various methodologies for evaluation and demonstrates the limitations of currently used rule-based metrics. Both automatic and manual evaluations reveal that our models can significantly simplify real-world online texts, indicating the potential of synthetic data in improving text simplification.

* Accepted at Fourth Workshop on Language Technology for Equality, Diversity, Inclusion - EACL 2024

Via

Access Paper or Ask Questions