Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:TinyBERT: Distilling BERT for Natural Language Understanding

Sep 24, 2019

Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu

Figure 1 for TinyBERT: Distilling BERT for Natural Language Understanding

Figure 2 for TinyBERT: Distilling BERT for Natural Language Understanding

Figure 3 for TinyBERT: Distilling BERT for Natural Language Understanding

Figure 4 for TinyBERT: Distilling BERT for Natural Language Understanding

Share this with someone who'll enjoy it:

Abstract:Language model pre-training, such as BERT, has significantly improved the performances of many natural language processing tasks. However, pre-trained language models are usually computationally expensive and memory intensive, so it is difficult to effectively execute them on some resource-restricted devices. To accelerate inference and reduce model size while maintaining accuracy, we firstly propose a novel transformer distillation method that is a specially designed knowledge distillation (KD) method for transformer-based models. By leveraging this new KD method, the plenty of knowledge encoded in a large teacher BERT can be well transferred to a small student TinyBERT. Moreover, we introduce a new two-stage learning framework for TinyBERT, which performs transformer distillation at both the pre-training and task-specific learning stages. This framework ensures that TinyBERT can capture both the general-domain and task-specific knowledge of the teacher BERT. TinyBERT is empirically effective and achieves comparable results with BERT in GLUE datasets, while being 7.5x smaller and 9.4x faster on inference. TinyBERT is also significantly better than state-of-the-art baselines, even with only about 28% parameters and 31% inference time of baselines.

* 13 pages, 2 figures, 9 tables

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:TinyBERT: Distilling BERT for Natural Language Understanding

Paper and Code