Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Distribution Shift Matters for Knowledge Distillation with Webly Collected Images

Jul 21, 2023

Jialiang Tang, Shuo Chen, Gang Niu, Masashi Sugiyama, Chen Gong

Figure 1 for Distribution Shift Matters for Knowledge Distillation with Webly Collected Images

Figure 2 for Distribution Shift Matters for Knowledge Distillation with Webly Collected Images

Figure 3 for Distribution Shift Matters for Knowledge Distillation with Webly Collected Images

Figure 4 for Distribution Shift Matters for Knowledge Distillation with Webly Collected Images

Share this with someone who'll enjoy it:

Abstract:Knowledge distillation aims to learn a lightweight student network from a pre-trained teacher network. In practice, existing knowledge distillation methods are usually infeasible when the original training data is unavailable due to some privacy issues and data management considerations. Therefore, data-free knowledge distillation approaches proposed to collect training instances from the Internet. However, most of them have ignored the common distribution shift between the instances from original training data and webly collected data, affecting the reliability of the trained student network. To solve this problem, we propose a novel method dubbed ``Knowledge Distillation between Different Distributions" (KD$^{3}$), which consists of three components. Specifically, we first dynamically select useful training instances from the webly collected data according to the combined predictions of teacher network and student network. Subsequently, we align both the weighted features and classifier parameters of the two networks for knowledge memorization. Meanwhile, we also build a new contrastive learning block called MixDistribution to generate perturbed data with a new distribution for instance alignment, so that the student network can further learn a distribution-invariant representation. Intensive experiments on various benchmark datasets demonstrate that our proposed KD$^{3}$ can outperform the state-of-the-art data-free knowledge distillation approaches.

View paper on

Share this with someone who'll enjoy it:

Title:Distribution Shift Matters for Knowledge Distillation with Webly Collected Images

Paper and Code