Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yifan Ge

Friends and Foes in Learning from Noisy Labels

Mar 28, 2021

Yifan Zhou, Yifan Ge, Jianxin Wu

Figure 1 for Friends and Foes in Learning from Noisy Labels

Figure 2 for Friends and Foes in Learning from Noisy Labels

Figure 3 for Friends and Foes in Learning from Noisy Labels

Figure 4 for Friends and Foes in Learning from Noisy Labels

Abstract:Learning from examples with noisy labels has attracted increasing attention recently. But, this paper will show that the commonly used CIFAR-based datasets and the accuracy evaluation metric used in the literature are both inappropriate in this context. An alternative valid evaluation metric and new datasets are proposed in this paper to promote proper research and evaluation in this area. Then, friends and foes are identified from existing methods as technical components that are either beneficial or detrimental to deep learning from noisy labeled examples, respectively, and this paper improves and combines technical components from the friends category, including self-supervised learning, new warmup strategy, instance filtering and label correction. The resulting F&F method significantly outperforms existing methods on the proposed nCIFAR datasets and the real-world Clothing1M dataset.

Via

Access Paper or Ask Questions

In Defense of Feature Mimicking for Knowledge Distillation

Nov 03, 2020

Guo-Hua Wang, Yifan Ge, Jianxin Wu

Figure 1 for In Defense of Feature Mimicking for Knowledge Distillation

Figure 2 for In Defense of Feature Mimicking for Knowledge Distillation

Figure 3 for In Defense of Feature Mimicking for Knowledge Distillation

Figure 4 for In Defense of Feature Mimicking for Knowledge Distillation

Abstract:Knowledge distillation (KD) is a popular method to train efficient networks ("student") with the help of high-capacity networks ("teacher"). Traditional methods use the teacher's soft logit as extra supervision to train the student network. In this paper, we argue that it is more advantageous to make the student mimic the teacher's features in the penultimate layer. Not only the student can directly learn more effective information from the teacher feature, feature mimicking can also be applied for teachers trained without a softmax layer. Experiments show that it can achieve higher accuracy than traditional KD. To further facilitate feature mimicking, we decompose a feature vector into the magnitude and the direction. We argue that the teacher should give more freedom to the student feature's magnitude, and let the student pay more attention on mimicking the feature direction. To meet this requirement, we propose a loss term based on locality-sensitive hashing (LSH). With the help of this new loss, our method indeed mimics feature directions more accurately, relaxes constraints on feature magnitudes, and achieves state-of-the-art distillation accuracy.

Via

Access Paper or Ask Questions