Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

Mar 01, 2022

Laurent Sartran, Samuel Barrett, Adhiguna Kuncoro, Miloš Stanojević, Phil Blunsom, Chris Dyer

Figure 1 for Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

Figure 2 for Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

Figure 3 for Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

Figure 4 for Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

Share this with someone who'll enjoy it:

Abstract:Transformer language models that are trained on vast amounts of data have achieved remarkable success at various NLP benchmarks. Intriguingly, this success is achieved by models that lack an explicit modeling of hierarchical syntactic structures, which were hypothesized by decades of linguistic research to be necessary for good generalization. This naturally leaves a question: to what extent can we further improve the performance of Transformer language models, through an inductive bias that encourages the model to explain the data through the lens of recursive syntactic compositions? Although the benefits of modeling recursive syntax have been shown at the small data and model scales, it remains an open question whether -- and to what extent -- a similar design principle is still beneficial in the case of powerful Transformer language models that work well at scale. To answer these questions, we introduce Transformer Grammars -- a novel class of Transformer language models that combine: (i) the expressive power, scalability, and strong performance of Transformers, and (ii) recursive syntactic compositions, which here are implemented through a special attention mask. We find that Transformer Grammars outperform various strong baselines on multiple syntax-sensitive language modeling evaluation metrics, in addition to sentence-level language modeling perplexity. Nevertheless, we find that the recursive syntactic composition bottleneck harms perplexity on document-level modeling, providing evidence that a different kind of memory mechanism -- that works independently of syntactic structures -- plays an important role in the processing of long-form text.

* 24 pages, 9 figures, 3 tables and 1 algorithm

View paper on

Share this with someone who'll enjoy it:

Title:Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

Paper and Code