Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vasudevan Jagannathan

Revisiting text decomposition methods for NLI-based factuality scoring of summaries

Nov 30, 2022

John Glover, Federico Fancellu, Vasudevan Jagannathan, Matthew R. Gormley, Thomas Schaaf

Figure 1 for Revisiting text decomposition methods for NLI-based factuality scoring of summaries

Figure 2 for Revisiting text decomposition methods for NLI-based factuality scoring of summaries

Figure 3 for Revisiting text decomposition methods for NLI-based factuality scoring of summaries

Figure 4 for Revisiting text decomposition methods for NLI-based factuality scoring of summaries

Abstract:Scoring the factuality of a generated summary involves measuring the degree to which a target text contains factual information using the input document as support. Given the similarities in the problem formulation, previous work has shown that Natural Language Inference models can be effectively repurposed to perform this task. As these models are trained to score entailment at a sentence level, several recent studies have shown that decomposing either the input document or the summary into sentences helps with factuality scoring. But is fine-grained decomposition always a winning strategy? In this paper we systematically compare different granularities of decomposition -- from document to sub-sentence level, and we show that the answer is no. Our results show that incorporating additional context can yield improvement, but that this does not necessarily apply to all datasets. We also show that small changes to previously proposed entailment-based scoring methods can result in better performance, highlighting the need for caution in model and methodology selection for downstream tasks.

* Generation, Evaluation & Metrics (GEM) Workshop 2022

Via

Access Paper or Ask Questions

Leveraging Pretrained Models for Automatic Summarization of Doctor-Patient Conversations

Sep 24, 2021

Longxiang Zhang, Renato Negrinho, Arindam Ghosh, Vasudevan Jagannathan, Hamid Reza Hassanzadeh, Thomas Schaaf, Matthew R. Gormley

Figure 1 for Leveraging Pretrained Models for Automatic Summarization of Doctor-Patient Conversations

Figure 2 for Leveraging Pretrained Models for Automatic Summarization of Doctor-Patient Conversations

Figure 3 for Leveraging Pretrained Models for Automatic Summarization of Doctor-Patient Conversations

Figure 4 for Leveraging Pretrained Models for Automatic Summarization of Doctor-Patient Conversations

Abstract:Fine-tuning pretrained models for automatically summarizing doctor-patient conversation transcripts presents many challenges: limited training data, significant domain shift, long and noisy transcripts, and high target summary variability. In this paper, we explore the feasibility of using pretrained transformer models for automatically summarizing doctor-patient conversations directly from transcripts. We show that fluent and adequate summaries can be generated with limited training data by fine-tuning BART on a specially constructed dataset. The resulting models greatly surpass the performance of an average human annotator and the quality of previous published work for the task. We evaluate multiple methods for handling long conversations, comparing them to the obvious baseline of truncating the conversation to fit the pretrained model length limit. We introduce a multistage approach that tackles the task by learning two fine-tuned models: one for summarizing conversation chunks into partial summaries, followed by one for rewriting the collection of partial summaries into a complete summary. Using a carefully chosen fine-tuning dataset, this method is shown to be effective at handling longer conversations, improving the quality of generated summaries. We conduct both an automatic evaluation (through ROUGE and two concept-based metrics focusing on medical findings) and a human evaluation (through qualitative examples from literature, assessing hallucination, generalization, fluency, and general quality of the generated summaries).

* Accepted in Findings of the EMNLP 2021. Code is available at https://github.com/negrinho/medical_conversation_summarization

Via

Access Paper or Ask Questions