Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model

Jun 11, 2023

Shicheng Tan, Weng Lam Tam, Yuanchun Wang, Wenwen Gong, Yang Yang, Hongyin Tang, Keqing He, Jiahao Liu, Jingang Wang, Shu Zhao(+2 more)

Figure 1 for GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model

Figure 2 for GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model

Figure 3 for GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model

Figure 4 for GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model

Share this with someone who'll enjoy it:

Abstract:Currently, the reduction in the parameter scale of large-scale pre-trained language models (PLMs) through knowledge distillation has greatly facilitated their widespread deployment on various devices. However, the deployment of knowledge distillation systems faces great challenges in real-world industrial-strength applications, which require the use of complex distillation methods on even larger-scale PLMs (over 10B), limited by memory on GPUs and the switching of methods. To overcome these challenges, we propose GKD, a general knowledge distillation framework that supports distillation on larger-scale PLMs using various distillation methods. With GKD, developers can build larger distillation models on memory-limited GPUs and easily switch and combine different distillation methods within a single framework. Experimental results show that GKD can support the distillation of at least 100B-scale PLMs and 25 mainstream methods on 8 NVIDIA A100 (40GB) GPUs.

* accepted for ACL 2023 industry track

View paper on

Share this with someone who'll enjoy it:

Title:GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model

Paper and Code