Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Transformer-based Knowledge Distillation for Efficient Semantic Segmentation of Road-driving Scenes

Feb 27, 2022

Ruiping Liu, Kailun Yang, Huayao Liu, Jiaming Zhang, Kunyu Peng, Rainer Stiefelhagen

Figure 1 for Transformer-based Knowledge Distillation for Efficient Semantic Segmentation of Road-driving Scenes

Figure 2 for Transformer-based Knowledge Distillation for Efficient Semantic Segmentation of Road-driving Scenes

Figure 3 for Transformer-based Knowledge Distillation for Efficient Semantic Segmentation of Road-driving Scenes

Figure 4 for Transformer-based Knowledge Distillation for Efficient Semantic Segmentation of Road-driving Scenes

Share this with someone who'll enjoy it:

Abstract:For scene understanding in robotics and automated driving, there is a growing interest in solving semantic segmentation tasks with transformer-based methods. However, effective transformers are always too cumbersome and computationally expensive to solve semantic segmentation in real time, which is desired for robotic systems. Moreover, due to the lack of inductive biases compared to Convolutional Neural Networks (CNNs), pre-training on a large dataset is essential but it takes a long time. Knowledge Distillation (KD) speeds up inference and maintains accuracy while transferring knowledge from a pre-trained cumbersome teacher model to a compact student model. Most traditional KD methods for CNNs focus on response-based knowledge and feature-based knowledge. In contrast, we present a novel KD framework according to the nature of transformers, i.e., training compact transformers by transferring the knowledge from feature maps and patch embeddings of large transformers. To this purpose, two modules are proposed: (1) the Selective Kernel Fusion (SKF) module, which helps to construct an efficient relation-based KD framework, Selective Kernel Review (SKR); (2) the Patch Embedding Alignment (PEA) module, which performs the dimensional transformation of patch embeddings. The combined KD framework is called SKR+PEA. Through comprehensive experiments on Cityscapes and ACDC datasets, it indicates that our proposed approach outperforms recent state-of-the-art KD frameworks and rivals the time-consuming pre-training method. Code will be made publicly available at https://github.com/RuipingL/SKR_PEA.git

* Code will be made publicly available at https://github.com/RuipingL/SKR_PEA.git

View paper on

Share this with someone who'll enjoy it:

Title:Transformer-based Knowledge Distillation for Efficient Semantic Segmentation of Road-driving Scenes

Paper and Code