Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Oracle Teacher: Towards Better Knowledge Distillation

Nov 05, 2021

Ji Won Yoon, Hyung Yong Kim, Hyeonseung Lee, Sunghwan Ahn, Nam Soo Kim

Figure 1 for Oracle Teacher: Towards Better Knowledge Distillation

Figure 2 for Oracle Teacher: Towards Better Knowledge Distillation

Figure 3 for Oracle Teacher: Towards Better Knowledge Distillation

Figure 4 for Oracle Teacher: Towards Better Knowledge Distillation

Share this with someone who'll enjoy it:

Abstract:Knowledge distillation (KD), best known as an effective method for model compression, aims at transferring the knowledge of a bigger network (teacher) to a much smaller network (student). Conventional KD methods usually employ the teacher model trained in a supervised manner, where output labels are treated only as targets. Extending this supervised scheme further, we introduce a new type of teacher model for KD, namely Oracle Teacher, that utilizes the embeddings of both the source inputs and the output labels to extract a more accurate knowledge to be transferred to the student. The proposed model follows the encoder-decoder attention structure of the Transformer network, which allows the model to attend to related information from the output labels. Extensive experiments are conducted on three different sequence learning tasks: speech recognition, scene text recognition, and machine translation. From the experimental results, we empirically show that the proposed model improves the students across these tasks while achieving a considerable speed-up in the teacher model's training time.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

View paper on

Share this with someone who'll enjoy it:

Title:Oracle Teacher: Towards Better Knowledge Distillation

Paper and Code