Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jinsong Fan

Student-Oriented Teacher Knowledge Refinement for Knowledge Distillation

Sep 27, 2024

Chaomin Shen, Yaomin Huang, Haokun Zhu, Jinsong Fan, Guixu Zhang

Figure 1 for Student-Oriented Teacher Knowledge Refinement for Knowledge Distillation

Figure 2 for Student-Oriented Teacher Knowledge Refinement for Knowledge Distillation

Figure 3 for Student-Oriented Teacher Knowledge Refinement for Knowledge Distillation

Figure 4 for Student-Oriented Teacher Knowledge Refinement for Knowledge Distillation

Abstract:Knowledge distillation has become widely recognized for its ability to transfer knowledge from a large teacher network to a compact and more streamlined student network. Traditional knowledge distillation methods primarily follow a teacher-oriented paradigm that imposes the task of learning the teacher's complex knowledge onto the student network. However, significant disparities in model capacity and architectural design hinder the student's comprehension of the complex knowledge imparted by the teacher, resulting in sub-optimal performance. This paper introduces a novel perspective emphasizing student-oriented and refining the teacher's knowledge to better align with the student's needs, thereby improving knowledge transfer effectiveness. Specifically, we present the Student-Oriented Knowledge Distillation (SoKD), which incorporates a learnable feature augmentation strategy during training to refine the teacher's knowledge of the student dynamically. Furthermore, we deploy the Distinctive Area Detection Module (DAM) to identify areas of mutual interest between the teacher and student, concentrating knowledge transfer within these critical areas to avoid transferring irrelevant information. This customized module ensures a more focused and effective knowledge distillation process. Our approach, functioning as a plug-in, could be integrated with various knowledge distillation methods. Extensive experimental results demonstrate the efficacy and generalizability of our method.

Via

Access Paper or Ask Questions

Defending Against Adversarial Attacks by Suppressing the Largest Eigenvalue of Fisher Information Matrix

Sep 13, 2019

Chaomin Shen, Yaxin Peng, Guixu Zhang, Jinsong Fan

Figure 1 for Defending Against Adversarial Attacks by Suppressing the Largest Eigenvalue of Fisher Information Matrix

Figure 2 for Defending Against Adversarial Attacks by Suppressing the Largest Eigenvalue of Fisher Information Matrix

Figure 3 for Defending Against Adversarial Attacks by Suppressing the Largest Eigenvalue of Fisher Information Matrix

Figure 4 for Defending Against Adversarial Attacks by Suppressing the Largest Eigenvalue of Fisher Information Matrix

Abstract:We propose a scheme for defending against adversarial attacks by suppressing the largest eigenvalue of the Fisher information matrix (FIM). Our starting point is one explanation on the rationale of adversarial examples. Based on the idea of the difference between a benign sample and its adversarial example is measured by the Euclidean norm, while the difference between their classification probability densities at the last (softmax) layer of the network could be measured by the Kullback-Leibler (KL) divergence, the explanation shows that the output difference is a quadratic form of the input difference. If the eigenvalue of this quadratic form (a.k.a. FIM) is large, the output difference becomes large even when the input difference is small, which explains the adversarial phenomenon. This makes the adversarial defense possible by controlling the eigenvalues of the FIM. Our solution is adding one term representing the trace of the FIM to the loss function of the original network, as the largest eigenvalue is bounded by the trace. Our defensive scheme is verified by experiments using a variety of common attacking methods on typical deep neural networks, e.g. LeNet, VGG and ResNet, with datasets MNIST, CIFAR-10, and German Traffic Sign Recognition Benchmark (GTSRB). Our new network, after adopting the novel loss function and retraining, has an effective and robust defensive capability, as it decreases the fooling ratio of the generated adversarial examples, and remains the classification accuracy of the original network.

* 11 pages, 5 figures

Via

Access Paper or Ask Questions