Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Can Transformer Models Measure Coherence In Text? Re-Thinking the Shuffle Test

Jul 07, 2021

Philippe Laban, Luke Dai, Lucas Bandarkar, Marti A. Hearst

Figure 1 for Can Transformer Models Measure Coherence In Text? Re-Thinking the Shuffle Test

Figure 2 for Can Transformer Models Measure Coherence In Text? Re-Thinking the Shuffle Test

Figure 3 for Can Transformer Models Measure Coherence In Text? Re-Thinking the Shuffle Test

Figure 4 for Can Transformer Models Measure Coherence In Text? Re-Thinking the Shuffle Test

Share this with someone who'll enjoy it:

Abstract:The Shuffle Test is the most common task to evaluate whether NLP models can measure coherence in text. Most recent work uses direct supervision on the task; we show that by simply finetuning a RoBERTa model, we can achieve a near perfect accuracy of 97.8%, a state-of-the-art. We argue that this outstanding performance is unlikely to lead to a good model of text coherence, and suggest that the Shuffle Test should be approached in a Zero-Shot setting: models should be evaluated without being trained on the task itself. We evaluate common models in this setting, such as Generative and Bi-directional Transformers, and find that larger architectures achieve high-performance out-of-the-box. Finally, we suggest the k-Block Shuffle Test, a modification of the original by increasing the size of blocks shuffled. Even though human reader performance remains high (around 95% accuracy), model performance drops from 94% to 78% as block size increases, creating a conceptually simple challenge to benchmark NLP models. Code available: https://github.com/tingofurro/shuffle_test/

* Association for Computational Linguistics (2021) * Accepted at ACL-IJCNLP 2021 (short paper), 7 pages, 4 figures

View paper on

Share this with someone who'll enjoy it:

Title:Can Transformer Models Measure Coherence In Text? Re-Thinking the Shuffle Test

Paper and Code