Abstract:Knowledge distillation is an effective transfer of knowledge from a heavy network (teacher) to a small network (student) to boost students' performance. Self-knowledge distillation, the special case of knowledge distillation, has been proposed to remove the large teacher network training process while preserving the student's performance. This paper introduces a novel Self-knowledge distillation approach via Siamese representation learning, which minimizes the difference between two representation vectors of the two different views from a given sample. Our proposed method, SKD-SRL, utilizes both soft label distillation and the similarity of representation vectors. Therefore, SKD-SRL can generate more consistent predictions and representations in various views of the same data point. Our benchmark has been evaluated on various standard datasets. The experimental results have shown that SKD-SRL significantly improves the accuracy compared to existing supervised learning and knowledge distillation methods regardless of the networks.