Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Distinguishing Ignorance from Error in LLM Hallucinations

Oct 29, 2024

Adi Simhi, Jonathan Herzig, Idan Szpektor, Yonatan Belinkov

Figure 1 for Distinguishing Ignorance from Error in LLM Hallucinations

Figure 2 for Distinguishing Ignorance from Error in LLM Hallucinations

Figure 3 for Distinguishing Ignorance from Error in LLM Hallucinations

Figure 4 for Distinguishing Ignorance from Error in LLM Hallucinations

Share this with someone who'll enjoy it:

Abstract:Large language models (LLMs) are susceptible to hallucinations-outputs that are ungrounded, factually incorrect, or inconsistent with prior generations. We focus on close-book Question Answering (CBQA), where previous work has not fully addressed the distinction between two possible kinds of hallucinations, namely, whether the model (1) does not hold the correct answer in its parameters or (2) answers incorrectly despite having the required knowledge. We argue that distinguishing these cases is crucial for detecting and mitigating hallucinations. Specifically, case (2) may be mitigated by intervening in the model's internal computation, as the knowledge resides within the model's parameters. In contrast, in case (1) there is no parametric knowledge to leverage for mitigation, so it should be addressed by resorting to an external knowledge source or abstaining. To help distinguish between the two cases, we introduce Wrong Answer despite having Correct Knowledge (WACK), an approach for constructing model-specific datasets for the second hallucination type. Our probing experiments indicate that the two kinds of hallucinations are represented differently in the model's inner states. Next, we show that datasets constructed using WACK exhibit variations across models, demonstrating that even when models share knowledge of certain facts, they still vary in the specific examples that lead to hallucinations. Finally, we show that training a probe on our WACK datasets leads to better hallucination detection of case (2) hallucinations than using the common generic one-size-fits-all datasets. The code is available at https://github.com/technion-cs-nlp/hallucination-mitigation .

View paper on

Share this with someone who'll enjoy it:

Title:Distinguishing Ignorance from Error in LLM Hallucinations

Paper and Code