Abstract:This effort aims to reproduce the results of experiments and analyze the robustness of the review framework for knowledge distillation introduced in the CVPR '21 paper 'Distilling Knowledge via Knowledge Review' by Chen et al. Previous works in knowledge distillation only studied connections paths between the same levels of the student and the teacher, and cross-level connection paths had not been considered. Chen et al. propose a new residual learning framework to train a single student layer using multiple teacher layers. They also design a novel fusion module to condense feature maps across levels and a loss function to compare feature information stored across different levels to improve performance. In this work, we consistently verify the improvements in test accuracy across student models as reported in the original paper and study the effectiveness of the novel modules introduced by conducting ablation studies and new experiments.
Abstract:Neural networks have now long been used for solving complex problems of image domain, yet designing the same needs manual expertise. Furthermore, techniques for automatically generating a suitable deep learning architecture for a given dataset have frequently made use of reinforcement learning and evolutionary methods which take extensive computational resources and time. We propose a new framework for neural architecture search based on a hill-climbing procedure using morphism operators that makes use of a novel gradient update scheme. The update is based on the aging of neural network layers and results in the reduction in the overall training time. This technique can search in a broader search space which subsequently yields competitive results. We achieve a 4.96% error rate on the CIFAR-10 dataset in 19.4 hours of a single GPU training.