Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Iwona Mochol

SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization

Nov 29, 2019

Bogdan Gliwa, Iwona Mochol, Maciej Biesek, Aleksander Wawer

Figure 1 for SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization

Figure 2 for SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization

Figure 3 for SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization

Figure 4 for SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization

Abstract:This paper introduces the SAMSum Corpus, a new dataset with abstractive dialogue summaries. We investigate the challenges it poses for automated summarization by testing several models and comparing their results with those obtained on a corpus of news articles. We show that model-generated summaries of dialogues achieve higher ROUGE scores than the model-generated summaries of news -- in contrast with human evaluators' judgement. This suggests that a challenging task of abstractive dialogue summarization requires dedicated models and non-standard quality measures. To our knowledge, our study is the first attempt to introduce a high-quality chat-dialogues corpus, manually annotated with abstractive summarizations, which can be used by the research community for further studies.

* Proceedings of the 2nd Workshop on New Frontiers in Summarization, Association for Computational Linguistics. November 2019
* Attachment contains the described dataset archived in 7z format. Please see the attached readme and licence. Update of the previous version: changed formats of train/val/test files in corpus.7z

Via

Access Paper or Ask Questions