Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:STORYSUMM: Evaluating Faithfulness in Story Summarization

Jul 09, 2024

Melanie Subbiah, Faisal Ladhak, Akankshya Mishra, Griffin Adams, Lydia B. Chilton, Kathleen McKeown

Figure 1 for STORYSUMM: Evaluating Faithfulness in Story Summarization

Figure 2 for STORYSUMM: Evaluating Faithfulness in Story Summarization

Figure 3 for STORYSUMM: Evaluating Faithfulness in Story Summarization

Figure 4 for STORYSUMM: Evaluating Faithfulness in Story Summarization

Share this with someone who'll enjoy it:

Abstract:Human evaluation has been the gold standard for checking faithfulness in abstractive summarization. However, with a challenging source domain like narrative, multiple annotators can agree a summary is faithful, while missing details that are obvious errors only once pointed out. We therefore introduce a new dataset, STORYSUMM, comprising LLM summaries of short stories with localized faithfulness labels and error explanations. This benchmark is for evaluation methods, testing whether a given method can detect challenging inconsistencies. Using this dataset, we first show that any one human annotation protocol is likely to miss inconsistencies, and we advocate for pursuing a range of methods when establishing ground truth for a summarization dataset. We finally test recent automatic metrics and find that none of them achieve more than 70% balanced accuracy on this task, demonstrating that it is a challenging benchmark for future work in faithfulness evaluation.

View paper on

Share this with someone who'll enjoy it:

Title:STORYSUMM: Evaluating Faithfulness in Story Summarization

Paper and Code