https://github.com/RuipingL/SKR_PEA.git
For scene understanding in robotics and automated driving, there is a growing interest in solving semantic segmentation tasks with transformer-based methods. However, effective transformers are always too cumbersome and computationally expensive to solve semantic segmentation in real time, which is desired for robotic systems. Moreover, due to the lack of inductive biases compared to Convolutional Neural Networks (CNNs), pre-training on a large dataset is essential but it takes a long time. Knowledge Distillation (KD) speeds up inference and maintains accuracy while transferring knowledge from a pre-trained cumbersome teacher model to a compact student model. Most traditional KD methods for CNNs focus on response-based knowledge and feature-based knowledge. In contrast, we present a novel KD framework according to the nature of transformers, i.e., training compact transformers by transferring the knowledge from feature maps and patch embeddings of large transformers. To this purpose, two modules are proposed: (1) the Selective Kernel Fusion (SKF) module, which helps to construct an efficient relation-based KD framework, Selective Kernel Review (SKR); (2) the Patch Embedding Alignment (PEA) module, which performs the dimensional transformation of patch embeddings. The combined KD framework is called SKR+PEA. Through comprehensive experiments on Cityscapes and ACDC datasets, it indicates that our proposed approach outperforms recent state-of-the-art KD frameworks and rivals the time-consuming pre-training method. Code will be made publicly available at