Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

S. H. Gary Chan

StableKD: Breaking Inter-block Optimization Entanglement for Stable Knowledge Distillation

Dec 20, 2023

Shiu-hong Kao, Jierun Chen, S. H. Gary Chan

Figure 1 for StableKD: Breaking Inter-block Optimization Entanglement for Stable Knowledge Distillation

Figure 2 for StableKD: Breaking Inter-block Optimization Entanglement for Stable Knowledge Distillation

Figure 3 for StableKD: Breaking Inter-block Optimization Entanglement for Stable Knowledge Distillation

Figure 4 for StableKD: Breaking Inter-block Optimization Entanglement for Stable Knowledge Distillation

Abstract:Knowledge distillation (KD) has been recognized as an effective tool to compress and accelerate models. However, current KD approaches generally suffer from an accuracy drop and/or an excruciatingly long distillation process. In this paper, we tackle the issue by first providing a new insight into a phenomenon that we call the Inter-Block Optimization Entanglement (IBOE), which makes the conventional end-to-end KD approaches unstable with noisy gradients. We then propose StableKD, a novel KD framework that breaks the IBOE and achieves more stable optimization. StableKD distinguishes itself through two operations: Decomposition and Recomposition, where the former divides a pair of teacher and student networks into several blocks for separate distillation, and the latter progressively merges them back, evolving towards end-to-end distillation. We conduct extensive experiments on CIFAR100, Imagewoof, and ImageNet datasets with various teacher-student pairs. Compared to other KD approaches, our simple yet effective StableKD greatly boosts the model accuracy by 1% ~ 18%, speeds up the convergence up to 10 times, and outperforms them with only 40% of the training data.

Via

Access Paper or Ask Questions