Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Factored Verification: Detecting and Reducing Hallucination in Summaries of Academic Papers

Oct 16, 2023

Charlie George, Andreas Stuhlmüller

Figure 1 for Factored Verification: Detecting and Reducing Hallucination in Summaries of Academic Papers

Figure 2 for Factored Verification: Detecting and Reducing Hallucination in Summaries of Academic Papers

Figure 3 for Factored Verification: Detecting and Reducing Hallucination in Summaries of Academic Papers

Figure 4 for Factored Verification: Detecting and Reducing Hallucination in Summaries of Academic Papers

Share this with someone who'll enjoy it:

Abstract:Hallucination plagues even frontier LLMs--but how bad is it really for summarizing academic papers? We evaluate Factored Verification, a simple automated method for detecting hallucinations in abstractive summaries. This method sets a new SotA on hallucination detection in the summarization task of the HaluEval benchmark, achieving 76.2% accuracy. We then use this method to estimate how often language models hallucinate when summarizing across multiple academic papers and find 0.62 hallucinations in the average ChatGPT (16k) summary, 0.84 for GPT-4, and 1.55 for Claude 2. We ask models to self-correct using Factored Critiques and find that this lowers the number of hallucinations to 0.49 for ChatGPT, 0.46 for GPT-4, and 0.95 for Claude 2. The hallucinations we find are often subtle, so we advise caution when using models to synthesize academic papers.

* Second Workshop on Information Extraction from Scientific Publications (WIESP) at IJCNLP-AACL 2023

View paper on

Share this with someone who'll enjoy it:

Title:Factored Verification: Detecting and Reducing Hallucination in Summaries of Academic Papers

Paper and Code