Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Filip Trhlik

Rethinking AI Cultural Evaluation

Jan 13, 2025

Michal Bravansky, Filip Trhlik, Fazl Barez

Abstract:As AI systems become more integrated into society, evaluating their capacity to align with diverse cultural values is crucial for their responsible deployment. Current evaluation methods predominantly rely on multiple-choice question (MCQ) datasets. In this study, we demonstrate that MCQs are insufficient for capturing the complexity of cultural values expressed in open-ended scenarios. Our findings highlight significant discrepancies between MCQ-based assessments and the values conveyed in unconstrained interactions. Based on these findings, we recommend moving beyond MCQs to adopt more open-ended, context-specific assessments that better reflect how AI models engage with cultural values in realistic settings.

Via

Access Paper or Ask Questions

Quantifying Generative Media Bias with a Corpus of Real-world and Generated News Articles

Jun 16, 2024

Filip Trhlik, Pontus Stenetorp

Figure 1 for Quantifying Generative Media Bias with a Corpus of Real-world and Generated News Articles

Figure 2 for Quantifying Generative Media Bias with a Corpus of Real-world and Generated News Articles

Figure 3 for Quantifying Generative Media Bias with a Corpus of Real-world and Generated News Articles

Figure 4 for Quantifying Generative Media Bias with a Corpus of Real-world and Generated News Articles

Abstract:Large language models (LLMs) are increasingly being utilised across a range of tasks and domains, with a burgeoning interest in their application within the field of journalism. This trend raises concerns due to our limited understanding of LLM behaviour in this domain, especially with respect to political bias. Existing studies predominantly focus on LLMs undertaking political questionnaires, which offers only limited insights into their biases and operational nuances. To address this gap, our study establishes a new curated dataset that contains 2,100 human-written articles and utilises their descriptions to generate 56,700 synthetic articles using nine LLMs. This enables us to analyse shifts in properties between human-authored and machine-generated articles, with this study focusing on political bias, detecting it using both supervised models and LLMs. Our findings reveal significant disparities between base and instruction-tuned LLMs, with instruction-tuned models exhibiting consistent political bias. Furthermore, we are able to study how LLMs behave as classifiers, observing their display of political bias even in this role. Overall, for the first time within the journalistic domain, this study outlines a framework and provides a structured dataset for quantifiable experiments, serving as a foundation for further research into LLM political bias and its implications.

* 20 pages, 10 figures

Via

Access Paper or Ask Questions

RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors

May 13, 2024

Liam Dugan, Alyssa Hwang, Filip Trhlik, Josh Magnus Ludan, Andrew Zhu, Hainiu Xu, Daphne Ippolito, Chris Callison-Burch

Figure 1 for RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors

Figure 2 for RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors

Figure 3 for RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors

Figure 4 for RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors

Abstract:Many commercial and open-source models claim to detect machine-generated text with very high accuracy (99\% or higher). However, very few of these detectors are evaluated on shared benchmark datasets and even when they are, the datasets used for evaluation are insufficiently challenging -- lacking variations in sampling strategy, adversarial attacks, and open-source generative models. In this work we present RAID: the largest and most challenging benchmark dataset for machine-generated text detection. RAID includes over 6 million generations spanning 11 models, 8 domains, 11 adversarial attacks and 4 decoding strategies. Using RAID, we evaluate the out-of-domain and adversarial robustness of 8 open- and 4 closed-source detectors and find that current detectors are easily fooled by adversarial attacks, variations in sampling strategies, repetition penalties, and unseen generative models. We release our dataset and tools to encourage further exploration into detector robustness.

* To appear at ACL 2024

Via

Access Paper or Ask Questions