Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:A Tensorized Transformer for Language Modeling

Aug 09, 2019

Xindian Ma, Peng Zhang, Shuai Zhang, Nan Duan, Yuexian Hou, Dawei Song, Ming Zhou

Figure 1 for A Tensorized Transformer for Language Modeling

Figure 2 for A Tensorized Transformer for Language Modeling

Figure 3 for A Tensorized Transformer for Language Modeling

Figure 4 for A Tensorized Transformer for Language Modeling

Share this with someone who'll enjoy it:

Abstract:Latest development of neural models has connected the encoder and decoder through a self-attention mechanism. In particular, Transformer, which is solely based on self-attention, has led to breakthroughs in Natural Language Processing (NLP) tasks. However, the multi-head attention mechanism, as a key component of Transformer, limits the effective deployment of the model to a limited resource setting. In this paper, based on the ideas of tensor decomposition and parameters sharing, we propose a novel self-attention model (namely Multi-linear attention) with Block-Term Tensor Decomposition (BTD). We test and verify the proposed attention method on three language modeling tasks (i.e., PTB, WikiText-103 and One-billion) and a neural machine translation task (i.e., WMT-2016 English-German). Multi-linear attention can not only largely compress the model parameters but also obtain performance improvements, compared with a number of language modeling approaches, such as Transformer, Transformer-XL, and Transformer with tensor train decomposition.

* Submitted to NeurIPS 2019

View paper on

Share this with someone who'll enjoy it:

Title:A Tensorized Transformer for Language Modeling

Paper and Code