Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Youngsu Moon

Pacemaker: Intermediate Teacher Knowledge Distillation For On-The-Fly Convolutional Neural Network

Mar 09, 2020

Wonchul Son, Youngbin Kim, Wonseok Song, Youngsu Moon, Wonjun Hwang

Figure 1 for Pacemaker: Intermediate Teacher Knowledge Distillation For On-The-Fly Convolutional Neural Network

Figure 2 for Pacemaker: Intermediate Teacher Knowledge Distillation For On-The-Fly Convolutional Neural Network

Figure 3 for Pacemaker: Intermediate Teacher Knowledge Distillation For On-The-Fly Convolutional Neural Network

Figure 4 for Pacemaker: Intermediate Teacher Knowledge Distillation For On-The-Fly Convolutional Neural Network

Abstract:There is a need for an on-the-fly computational process with very low performance system such as system-on-chip (SoC) and embedded device etc. This paper presents pacemaker knowledge distillation as intermediate ensemble teacher to use convolutional neural network in these systems. For on-the-fly system, we consider student model using 1xN shape on-the-fly filter and teacher model using normal NxN shape filter. We note three points about training student model, caused by applying on-the-fly filter. First, same depth but unavoidable thin model compression. Second, the large capacity gap and parameter size gap due to only the horizontal field must be selected not the vertical receptive. Third, the performance instability and degradation of direct distilling. To solve these problems, we propose intermediate teacher, named pacemaker, for an on-the-fly student. So, student can be trained from pacemaker and original teacher step by step. Experiments prove our proposed method make significant performance (accuracy) improvements: on CIFAR100, 5.39% increased in WRN-40-4 than conventional knowledge distillation which shows even low performance than baseline. And we solve train instability, occurred when conventional knowledge distillation was applied without proposed method, by reducing deviation range by applying proposed method pacemaker knowledge distillation.

Via

Access Paper or Ask Questions