Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Saad Obaid ul Islam

How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild

Feb 20, 2025

Saad Obaid ul Islam, Anne Lauscher, Goran Glavaš

Figure 1 for How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild

Figure 2 for How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild

Figure 3 for How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild

Figure 4 for How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild

Abstract:In the age of misinformation, hallucination -- the tendency of Large Language Models (LLMs) to generate non-factual or unfaithful responses -- represents the main risk for their global utility. Despite LLMs becoming increasingly multilingual, the vast majority of research on detecting and quantifying LLM hallucination are (a) English-centric and (b) focus on machine translation (MT) and summarization, tasks that are less common ``in the wild'' than open information seeking. In contrast, we aim to quantify the extent of LLM hallucination across languages in knowledge-intensive long-form question answering. To this end, we train a multilingual hallucination detection model and conduct a large-scale study across 30 languages and 6 open-source LLM families. We start from an English hallucination detection dataset and rely on MT to generate (noisy) training data in other languages. We also manually annotate gold data for five high-resource languages; we then demonstrate, for these languages, that the estimates of hallucination rates are similar between silver (LLM-generated) and gold test sets, validating the use of silver data for estimating hallucination rates for other languages. For the final rates estimation, we build a knowledge-intensive QA dataset for 30 languages with LLM-generated prompts and Wikipedia articles as references. We find that, while LLMs generate longer responses with more hallucinated tokens for higher-resource languages, there is no correlation between length-normalized hallucination rates of languages and their digital representation. Further, we find that smaller LLMs exhibit larger hallucination rates than larger models.

Via

Access Paper or Ask Questions

Tackling Hallucinations in Neural Chart Summarization

Aug 01, 2023

Saad Obaid ul Islam, Iza Škrjanec, Ondřej Dušek, Vera Demberg

Abstract:Hallucinations in text generation occur when the system produces text that is not grounded in the input. In this work, we tackle the problem of hallucinations in neural chart summarization. Our analysis shows that the target side of chart summarization training datasets often contains additional information, leading to hallucinations. We propose a natural language inference (NLI) based method to preprocess the training data and show through human evaluation that our method significantly reduces hallucinations. We also found that shortening long-distance dependencies in the input sequence and adding chart-related information like title and legends improves the overall performance.

* To be presented in INLG 2023

Via

Access Paper or Ask Questions