Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and Denoising

Jan 06, 2020

Ziyi Yang, Chenguang Zhu, Robert Gmyr, Michael Zeng, Xuedong Huang, Eric Darve

Figure 1 for TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and Denoising

Figure 2 for TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and Denoising

Figure 3 for TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and Denoising

Figure 4 for TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and Denoising

Share this with someone who'll enjoy it:

Abstract:Text summarization aims to extract essential information from a piece of text and transform it into a concise version. Existing unsupervised abstractive summarization models use recurrent neural networks framework and ignore abundant unlabeled corpora resources. In order to address these issues, we propose TED, a transformer-based unsupervised summarization system with pretraining on large-scale data. We first leverage the lead bias in news articles to pretrain the model on large-scale corpora. Then, we finetune TED on target domains through theme modeling and a denoising autoencoder to enhance the quality of summaries. Notably, TED outperforms all unsupervised abstractive baselines on NYT, CNN/DM and English Gigaword datasets with various document styles. Further analysis shows that the summaries generated by TED are abstractive and containing even higher proportions of novel tokens than those from supervised models.

* 10 pages, 3 figures

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and Denoising

Paper and Code