Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:You should evaluate your language model on marginal likelihood over tokenisations

Sep 21, 2021

Kris Cao, Laura Rimell

Figure 1 for You should evaluate your language model on marginal likelihood over tokenisations

Figure 2 for You should evaluate your language model on marginal likelihood over tokenisations

Figure 3 for You should evaluate your language model on marginal likelihood over tokenisations

Figure 4 for You should evaluate your language model on marginal likelihood over tokenisations

Share this with someone who'll enjoy it:

Abstract:Neural language models typically tokenise input text into sub-word units to achieve an open vocabulary. The standard approach is to use a single canonical tokenisation at both train and test time. We suggest that this approach is unsatisfactory and may bottleneck our evaluation of language model performance. Using only the one-best tokenisation ignores tokeniser uncertainty over alternative tokenisations, which may hurt model out-of-domain performance. In this paper, we argue that instead, language models should be evaluated on their marginal likelihood over tokenisations. We compare different estimators for the marginal likelihood based on sampling, and show that it is feasible to estimate the marginal likelihood with a manageable number of samples. We then evaluate pretrained English and German language models on both the one-best-tokenisation and marginal perplexities, and show that the marginal perplexity can be significantly better than the one best, especially on out-of-domain data. We link this difference in perplexity to the tokeniser uncertainty as measured by tokeniser entropy. We discuss some implications of our results for language model training and evaluation, particularly with regard to tokenisation robustness.

* accepted at EMNLP 2021

View paper on

Share this with someone who'll enjoy it:

Title:You should evaluate your language model on marginal likelihood over tokenisations

Paper and Code