Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Supervision Complexity and its Role in Knowledge Distillation

Jan 28, 2023

Hrayr Harutyunyan, Ankit Singh Rawat, Aditya Krishna Menon, Seungyeon Kim, Sanjiv Kumar

Figure 1 for Supervision Complexity and its Role in Knowledge Distillation

Figure 2 for Supervision Complexity and its Role in Knowledge Distillation

Figure 3 for Supervision Complexity and its Role in Knowledge Distillation

Figure 4 for Supervision Complexity and its Role in Knowledge Distillation

Share this with someone who'll enjoy it:

Abstract:Despite the popularity and efficacy of knowledge distillation, there is limited understanding of why it helps. In order to study the generalization behavior of a distilled student, we propose a new theoretical framework that leverages supervision complexity: a measure of alignment between teacher-provided supervision and the student's neural tangent kernel. The framework highlights a delicate interplay among the teacher's accuracy, the student's margin with respect to the teacher predictions, and the complexity of the teacher predictions. Specifically, it provides a rigorous justification for the utility of various techniques that are prevalent in the context of distillation, such as early stopping and temperature scaling. Our analysis further suggests the use of online distillation, where a student receives increasingly more complex supervision from teachers in different stages of their training. We demonstrate efficacy of online distillation and validate the theoretical findings on a range of image classification benchmarks and model architectures.

* Published at ICLR 2023

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Supervision Complexity and its Role in Knowledge Distillation

Paper and Code