Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Benchmarking LLMs on the Semantic Overlap Summarization Task

Feb 26, 2024

John Salvador, Naman Bansal, Mousumi Akter, Souvika Sarkar, Anupam Das, Shubhra Kanti Karmaker

Figure 1 for Benchmarking LLMs on the Semantic Overlap Summarization Task

Figure 2 for Benchmarking LLMs on the Semantic Overlap Summarization Task

Figure 3 for Benchmarking LLMs on the Semantic Overlap Summarization Task

Figure 4 for Benchmarking LLMs on the Semantic Overlap Summarization Task

Share this with someone who'll enjoy it:

Abstract:Semantic Overlap Summarization (SOS) is a constrained multi-document summarization task, where the constraint is to capture the common/overlapping information between two alternative narratives. While recent advancements in Large Language Models (LLMs) have achieved superior performance in numerous summarization tasks, a benchmarking study of the SOS task using LLMs is yet to be performed. As LLMs' responses are sensitive to slight variations in prompt design, a major challenge in conducting such a benchmarking study is to systematically explore a variety of prompts before drawing a reliable conclusion. Fortunately, very recently, the TELeR taxonomy has been proposed which can be used to design and explore various prompts for LLMs. Using this TELeR taxonomy and 15 popular LLMs, this paper comprehensively evaluates LLMs on the SOS Task, assessing their ability to summarize overlapping information from multiple alternative narratives. For evaluation, we report well-established metrics like ROUGE, BERTscore, and SEM-F1$ on two different datasets of alternative narratives. We conclude the paper by analyzing the strengths and limitations of various LLMs in terms of their capabilities in capturing overlapping information The code and datasets used to conduct this study are available at https://anonymous.4open.science/r/llm_eval-E16D.

View paper on

Share this with someone who'll enjoy it:

Title:Benchmarking LLMs on the Semantic Overlap Summarization Task

Paper and Code