Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:All You Need in Knowledge Distillation Is a Tailored Coordinate System

Dec 12, 2024

Junjie Zhou, Ke Zhu, Jianxin Wu

Figure 1 for All You Need in Knowledge Distillation Is a Tailored Coordinate System

Figure 2 for All You Need in Knowledge Distillation Is a Tailored Coordinate System

Figure 3 for All You Need in Knowledge Distillation Is a Tailored Coordinate System

Figure 4 for All You Need in Knowledge Distillation Is a Tailored Coordinate System

Share this with someone who'll enjoy it:

Abstract:Knowledge Distillation (KD) is essential in transferring dark knowledge from a large teacher to a small student network, such that the student can be much more efficient than the teacher but with comparable accuracy. Existing KD methods, however, rely on a large teacher trained specifically for the target task, which is both very inflexible and inefficient. In this paper, we argue that a SSL-pretrained model can effectively act as the teacher and its dark knowledge can be captured by the coordinate system or linear subspace where the features lie in. We then need only one forward pass of the teacher, and then tailor the coordinate system (TCS) for the student network. Our TCS method is teacher-free and applies to diverse architectures, works well for KD and practical few-shot learning, and allows cross-architecture distillation with large capacity gap. Experiments show that TCS achieves significantly higher accuracy than state-of-the-art KD methods, while only requiring roughly half of their training time and GPU memory costs.

View paper on

Share this with someone who'll enjoy it:

Title:All You Need in Knowledge Distillation Is a Tailored Coordinate System

Paper and Code