Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Deeply-Supervised Knowledge Distillation

Feb 16, 2022

Shiya Luo, Defang Chen, Can Wang

Figure 1 for Deeply-Supervised Knowledge Distillation

Figure 2 for Deeply-Supervised Knowledge Distillation

Figure 3 for Deeply-Supervised Knowledge Distillation

Figure 4 for Deeply-Supervised Knowledge Distillation

Share this with someone who'll enjoy it:

Abstract:Knowledge distillation aims to enhance the performance of a lightweight student model by exploiting the knowledge from a pre-trained cumbersome teacher model. However, in the traditional knowledge distillation, teacher predictions are only used to provide the supervisory signal for the last layer of the student model, which may result in those shallow student layers lacking accurate training guidance in the layer-by-layer back propagation and thus hinders effective knowledge transfer. To address this issue, we propose Deeply-Supervised Knowledge Distillation (DSKD), which fully utilizes class predictions and feature maps of the teacher model to supervise the training of shallow student layers. A loss-based weight allocation strategy is developed in DSKD to adaptively balance the learning process of each shallow layer, so as to further improve the student performance. Extensive experiments show that the performance of DSKD consistently exceeds state-of-the-art methods on various teacher-student models, confirming the effectiveness of our proposed method.

View paper on

Share this with someone who'll enjoy it:

Title:Deeply-Supervised Knowledge Distillation

Paper and Code