Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence

Jan 28, 2022

Wei Zhao, Michael Strube, Steffen Eger

Figure 1 for DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence

Figure 2 for DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence

Figure 3 for DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence

Figure 4 for DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence

Share this with someone who'll enjoy it:

Abstract:Recently, there has been a growing interest in designing text generation systems from a discourse coherence perspective, e.g., modeling the interdependence between sentences. Still, recent BERT-based evaluation metrics cannot recognize coherence and fail to punish incoherent elements in system outputs. In this work, we introduce DiscoScore, a parametrized discourse metric, which uses BERT to model discourse coherence from different perspectives, driven by Centering theory. Our experiments encompass 16 non-discourse and discourse metrics, including DiscoScore and popular coherence models, evaluated on summarization and document-level machine translation (MT). We find that (i) the majority of BERT-based metrics correlate much worse with human rated coherence than early discourse metrics, invented a decade ago; (ii) the recent state-of-the-art BARTScore is weak when operated at system level -- which is particularly problematic as systems are typically compared in this manner. DiscoScore, in contrast, achieves strong system-level correlation with human ratings, not only in coherence but also in factual consistency and other aspects, and surpasses BARTScore by over 10 correlation points on average. Further, aiming to understand DiscoScore, we provide justifications to the importance of discourse coherence for evaluation metrics, and explain the superiority of one variant over another. Our code is available at \url{https://github.com/AIPHES/DiscoScore}.

* v2: small fixes in the abstract

View paper on

Share this with someone who'll enjoy it:

Title:DiscoScore: Evaluating Text Generation with BERT and Discourse Coherence

Paper and Code