Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Oliver Kraus

Evaluation Should Not Ignore Variation: On the Impact of Reference Set Choice on Summarization Metrics

Jun 17, 2025

Silvia Casola, Yang Janet Liu, Siyao Peng, Oliver Kraus, Albert Gatt, Barbara Plank

Abstract:Human language production exhibits remarkable richness and variation, reflecting diverse communication styles and intents. However, this variation is often overlooked in summarization evaluation. While having multiple reference summaries is known to improve correlation with human judgments, the impact of using different reference sets on reference-based metrics has not been systematically investigated. This work examines the sensitivity of widely used reference-based metrics in relation to the choice of reference sets, analyzing three diverse multi-reference summarization datasets: SummEval, GUMSum, and DUC2004. We demonstrate that many popular metrics exhibit significant instability. This instability is particularly concerning for n-gram-based metrics like ROUGE, where model rankings vary depending on the reference sets, undermining the reliability of model comparisons. We also collect human judgments on LLM outputs for genre-diverse data and examine their correlation with metrics to supplement existing findings beyond newswire summaries, finding weak-to-no correlation. Taken together, we recommend incorporating reference set variation into summarization evaluation to enhance consistency alongside correlation with human judgments, especially when evaluating LLMs.

* 17 pages, 13 figures

Via

Access Paper or Ask Questions

Cross-Dialect Information Retrieval: Information Access in Low-Resource and High-Variance Languages

Dec 17, 2024

Robert Litschko, Oliver Kraus, Verena Blaschke, Barbara Plank

Figure 1 for Cross-Dialect Information Retrieval: Information Access in Low-Resource and High-Variance Languages

Figure 2 for Cross-Dialect Information Retrieval: Information Access in Low-Resource and High-Variance Languages

Figure 3 for Cross-Dialect Information Retrieval: Information Access in Low-Resource and High-Variance Languages

Figure 4 for Cross-Dialect Information Retrieval: Information Access in Low-Resource and High-Variance Languages

Abstract:A large amount of local and culture-specific knowledge (e.g., people, traditions, food) can only be found in documents written in dialects. While there has been extensive research conducted on cross-lingual information retrieval (CLIR), the field of cross-dialect retrieval (CDIR) has received limited attention. Dialect retrieval poses unique challenges due to the limited availability of resources to train retrieval models and the high variability in non-standardized languages. We study these challenges on the example of German dialects and introduce the first German dialect retrieval dataset, dubbed WikiDIR, which consists of seven German dialects extracted from Wikipedia. Using WikiDIR, we demonstrate the weakness of lexical methods in dealing with high lexical variation in dialects. We further show that commonly used zero-shot cross-lingual transfer approach with multilingual encoders do not transfer well to extremely low-resource setups, motivating the need for resource-lean and dialect-specific retrieval models. We finally demonstrate that (document) translation is an effective way to reduce the dialect gap in CDIR.

* Accepted at COLING 2025

Via

Access Paper or Ask Questions

Sign Language Sense Disambiguation

Sep 13, 2024

Jana Grimm, Miriam Winkler, Oliver Kraus, Tanalp Agustoslu

Abstract:This project explores methods to enhance sign language translation of German sign language, specifically focusing on disambiguation of homonyms. Sign language is ambiguous and understudied which is the basis for our experiments. We approach the improvement by training transformer-based models on various bodypart representations to shift the focus on said bodypart. To determine the impact of, e.g., the hand or mouth representations, we experiment with different combinations. The results show that focusing on the mouth increases the performance in small dataset settings while shifting the focus on the hands retrieves better results in larger dataset settings. Our results contribute to better accessibility for non-hearing persons by improving the systems powering digital assistants, enabling a more accurate interaction. The code for this project can be found on GitHub.

* LIMO2024 @ KONVENS 2024, 8 pages, 3 figures

Via

Access Paper or Ask Questions