Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yeonseok Jeong

Agent-as-Judge for Factual Summarization of Long Narratives

Jan 17, 2025

Yeonseok Jeong, Minsoo Kim, Seung-won Hwang, Byung-Hak Kim

Figure 1 for Agent-as-Judge for Factual Summarization of Long Narratives

Figure 2 for Agent-as-Judge for Factual Summarization of Long Narratives

Figure 3 for Agent-as-Judge for Factual Summarization of Long Narratives

Figure 4 for Agent-as-Judge for Factual Summarization of Long Narratives

Abstract:Large Language Models (LLMs) have demonstrated near-human performance in summarization tasks based on traditional metrics such as ROUGE and BERTScore. However, these metrics do not adequately capture critical aspects of summarization quality, such as factual accuracy, particularly for long narratives (>100K tokens). Recent advances, such as LLM-as-a-Judge, address the limitations of metrics based on lexical similarity but still exhibit factual inconsistencies, especially in understanding character relationships and states. In this work, we introduce NarrativeFactScore, a novel "Agent-as-a-Judge" framework for evaluating and refining summaries. By leveraging a Character Knowledge Graph (CKG) extracted from input and generated summaries, NarrativeFactScore assesses the factual consistency and provides actionable guidance for refinement, such as identifying missing or erroneous facts. We demonstrate the effectiveness of NarrativeFactScore through a detailed workflow illustration and extensive validation on widely adopted benchmarks, achieving superior performance compared to competitive methods. Our results highlight the potential of agent-driven evaluation systems to improve the factual reliability of LLM-generated summaries.

Via

Access Paper or Ask Questions

On Monotonic Aggregation for Open-domain QA

Aug 08, 2023

Sang-eun Han, Yeonseok Jeong, Seung-won Hwang, Kyungjae Lee

Abstract:Question answering (QA) is a critical task for speech-based retrieval from knowledge sources, by sifting only the answers without requiring to read supporting documents. Specifically, open-domain QA aims to answer user questions on unrestricted knowledge sources. Ideally, adding a source should not decrease the accuracy, but we find this property (denoted as "monotonicity") does not hold for current state-of-the-art methods. We identify the cause, and based on that we propose Judge-Specialist framework. Our framework consists of (1) specialist retrievers/readers to cover individual sources, and (2) judge, a dedicated language model to select the final answer. Our experiments show that our framework not only ensures monotonicity, but also outperforms state-of-the-art multi-source QA methods on Natural Questions. Additionally, we show that our models robustly preserve the monotonicity against noise from speech recognition. We publicly release our code and setting.

* INTERSPEECH 2023 Camera Ready

Via

Access Paper or Ask Questions