Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:A Call for Prudent Choice of Subword Merge Operations

May 24, 2019

Shuoyang Ding, Adithya Renduchintala, Kevin Duh

Figure 1 for A Call for Prudent Choice of Subword Merge Operations

Figure 2 for A Call for Prudent Choice of Subword Merge Operations

Figure 3 for A Call for Prudent Choice of Subword Merge Operations

Figure 4 for A Call for Prudent Choice of Subword Merge Operations

Share this with someone who'll enjoy it:

Abstract:Most neural machine translation systems are built upon subword units extracted by methods such as Byte-Pair Encoding (BPE) or wordpiece. However, the choice of number of merge operations is generally made by following existing recipes. In this paper, we conduct a systematic exploration of different BPE merge operations to understand how it interacts with the model architecture, the strategy to build vocabularies and the language pair. Our exploration could provide guidance for selecting proper BPE configurations in the future. Most prominently: we show that for LSTM-based architectures, it is necessary to experiment with a wide range of different BPE operations as there is no typical optimal BPE configuration, whereas for Transformer architectures, smaller BPE size tends to be a typically optimal choice. We urge the community to make prudent choices with subword merge operations, as our experiments indicate that a sub-optimal BPE configuration alone could easily reduce the system performance by 3-4 BLEU points.

* Accepted to MT Summit 2019

View paper on

Share this with someone who'll enjoy it:

Title:A Call for Prudent Choice of Subword Merge Operations

Paper and Code