Picture for Barbara Plank

Barbara Plank

BoN Appetit Team at LeWiDi-2025: Best-of-N Test-time Scaling Can Not Stomach Annotation Disagreements (Yet)

Add code
Oct 14, 2025
Viaarxiv icon

Standard-to-Dialect Transfer Trends Differ across Text and Speech: A Case Study on Intent and Topic Classification in German Dialects

Add code
Oct 09, 2025
Viaarxiv icon

Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort

Add code
Oct 01, 2025
Viaarxiv icon

Evaluating Large Language Models for Cross-Lingual Retrieval

Add code
Sep 18, 2025
Viaarxiv icon

Revisiting Active Learning under (Human) Label Variation

Add code
Jul 03, 2025
Viaarxiv icon

Evaluation Should Not Ignore Variation: On the Impact of Reference Set Choice on Summarization Metrics

Add code
Jun 17, 2025
Viaarxiv icon

Do LLMs Give Psychometrically Plausible Responses in Educational Assessments?

Add code
Jun 11, 2025
Viaarxiv icon

Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation

Add code
May 29, 2025
Viaarxiv icon

LiTEx: A Linguistic Taxonomy of Explanations for Understanding Within-Label Variation in Natural Language Inference

Add code
May 28, 2025
Viaarxiv icon

MAKIEval: A Multilingual Automatic WiKidata-based Framework for Cultural Awareness Evaluation for LLMs

Add code
May 27, 2025
Viaarxiv icon