Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering

May 02, 2020

Qingqing Cao, Harsh Trivedi, Aruna Balasubramanian, Niranjan Balasubramanian

Figure 1 for DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering

Figure 2 for DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering

Figure 3 for DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering

Figure 4 for DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering

Share this with someone who'll enjoy it:

Abstract:Transformer-based QA models use input-wide self-attention -- i.e. across both the question and the input passage -- at all layers, causing them to be slow and memory-intensive. It turns out that we can get by without input-wide self-attention at all layers, especially in the lower layers. We introduce DeFormer, a decomposed transformer, which substitutes the full self-attention with question-wide and passage-wide self-attentions in the lower layers. This allows for question-independent processing of the input text representations, which in turn enables pre-computing passage representations reducing runtime compute drastically. Furthermore, because DeFormer is largely similar to the original model, we can initialize DeFormer with the pre-training weights of a standard transformer, and directly fine-tune on the target QA dataset. We show DeFormer versions of BERT and XLNet can be used to speed up QA by over 4.3x and with simple distillation-based losses they incur only a 1% drop in accuracy. We open source the code at https://github.com/StonyBrookNLP/deformer.

* ACL 2020 camera ready

View paper on

Share this with someone who'll enjoy it:

Title:DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering

Paper and Code