Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Automatic Evaluation of Attribution by Large Language Models

May 10, 2023

Xiang Yue, Boshi Wang, Kai Zhang, Ziru Chen, Yu Su, Huan Sun

Figure 1 for Automatic Evaluation of Attribution by Large Language Models

Figure 2 for Automatic Evaluation of Attribution by Large Language Models

Figure 3 for Automatic Evaluation of Attribution by Large Language Models

Figure 4 for Automatic Evaluation of Attribution by Large Language Models

Share this with someone who'll enjoy it:

Abstract:A recent focus of large language model (LLM) development, as exemplified by generative search engines, is to incorporate external references to generate and support their claims. However, evaluating the attribution, i.e., verifying whether the generated statement is indeed fully supported by the cited reference, remains an open problem. Although human evaluation is common practice, it is costly and time-consuming. In this paper, we investigate the automatic evaluation of attribution by LLMs. We begin by providing a definition of attribution and then explore two approaches for automatic evaluation: prompting LLMs and fine-tuning smaller LMs. The fine-tuning data is repurposed from related tasks, such as question answering, fact-checking, natural language inference, and summarization. To facilitate the evaluation, we manually curate a set of test examples covering 12 domains from a generative search engine, New Bing. Our results on the curated test set and simulated test examples from existing benchmark questions highlight both promising signals as well as remaining challenges for the automatic evaluation of attribution. We hope our testbed, modeling methodology, and insights will help lay the foundation for future studies on this important problem.

View paper on

Share this with someone who'll enjoy it:

Title:Automatic Evaluation of Attribution by Large Language Models

Paper and Code