Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:UFO: a Unified and Flexible Framework for Evaluating Factuality of Large Language Models

Feb 22, 2024

Zhaoheng Huang, Zhicheng Dou, Yutao Zhu, Ji-rong Wen

Figure 1 for UFO: a Unified and Flexible Framework for Evaluating Factuality of Large Language Models

Figure 2 for UFO: a Unified and Flexible Framework for Evaluating Factuality of Large Language Models

Figure 3 for UFO: a Unified and Flexible Framework for Evaluating Factuality of Large Language Models

Figure 4 for UFO: a Unified and Flexible Framework for Evaluating Factuality of Large Language Models

Share this with someone who'll enjoy it:

Abstract:Large language models (LLMs) may generate text that lacks consistency with human knowledge, leading to factual inaccuracies or \textit{hallucination}. Existing research for evaluating the factuality of LLMs involves extracting fact claims using an LLM and verifying them against a predefined fact source. However, these evaluation metrics are task-specific, and not scalable, and the substitutability of fact sources in different tasks is under-explored. To address these challenges, we categorize four available fact sources: human-written evidence, reference documents, search engine results, and LLM knowledge, along with five text generation tasks containing six representative datasets. Then, we propose \texttt{UFO}, an LLM-based unified and flexible evaluation framework to verify facts against plug-and-play fact sources. We implement five evaluation scenarios based on this framework. Experimental results show that for most QA tasks, human-written evidence and reference documents are crucial, and they can substitute for each other in retrieval-augmented QA tasks. In news fact generation tasks, search engine results and LLM knowledge are essential. Our dataset and code are available at \url{https://github.com/WaldenRUC/UFO}.

* under review

View paper on

Share this with someone who'll enjoy it:

Title:UFO: a Unified and Flexible Framework for Evaluating Factuality of Large Language Models

Paper and Code