Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

Dec 14, 2020

Hao Fu, Shaojun Zhou, Qihong Yang, Junjie Tang, Guiquan Liu, Kaikui Liu, Xiaolong Li

Figure 1 for LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

Figure 2 for LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

Figure 3 for LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

Figure 4 for LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

Share this with someone who'll enjoy it:

Abstract:The pre-training models such as BERT have achieved great results in various natural language processing problems. However, a large number of parameters need significant amounts of memory and the consumption of inference time, which makes it difficult to deploy them on edge devices. In this work, we propose a knowledge distillation method LRC-BERT based on contrastive learning to fit the output of the intermediate layer from the angular distance aspect, which is not considered by the existing distillation methods. Furthermore, we introduce a gradient perturbation-based training architecture in the training phase to increase the robustness of LRC-BERT, which is the first attempt in knowledge distillation. Additionally, in order to better capture the distribution characteristics of the intermediate layer, we design a two-stage training method for the total distillation loss. Finally, by verifying 8 datasets on the General Language Understanding Evaluation (GLUE) benchmark, the performance of the proposed LRC-BERT exceeds the existing state-of-the-art methods, which proves the effectiveness of our method.

View paper on

Share this with someone who'll enjoy it:

Title:LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding

Paper and Code