Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:GSB: Group Superposition Binarization for Vision Transformer with Limited Training Samples

May 19, 2023

Tian Gao, Cheng-Zhong Xu, Le Zhang, Hui Kong

Figure 1 for GSB: Group Superposition Binarization for Vision Transformer with Limited Training Samples

Figure 2 for GSB: Group Superposition Binarization for Vision Transformer with Limited Training Samples

Figure 3 for GSB: Group Superposition Binarization for Vision Transformer with Limited Training Samples

Figure 4 for GSB: Group Superposition Binarization for Vision Transformer with Limited Training Samples

Share this with someone who'll enjoy it:

Abstract:Affected by the massive amount of parameters, ViT usually suffers from serious overfitting problems with a relatively limited number of training samples. In addition, ViT generally demands heavy computing resources, which limit its deployment on resource-constrained devices. As a type of model-compression method,model binarization is potentially a good choice to solve the above problems. Compared with the full-precision one, the model with the binarization method replaces complex tensor multiplication with simple bit-wise binary operations and represents full-precision model parameters and activations with only 1-bit ones, which potentially solves the problem of model size and computational complexity, respectively. In this paper, we find that the decline of the accuracy of the binary ViT model is mainly due to the information loss of the Attention module and the Value vector. Therefore, we propose a novel model binarization technique, called Group Superposition Binarization (GSB), to deal with these issues. Furthermore, in order to further improve the performance of the binarization model, we have investigated the gradient calculation procedure in the binarization process and derived more proper gradient calculation equations for GSB to reduce the influence of gradient mismatch. Then, the knowledge distillation technique is introduced to alleviate the performance degradation caused by model binarization. Experiments on three datasets with limited numbers of training samples demonstrate that the proposed GSB model achieves state-of-the-art performance among the binary quantization schemes and exceeds its full-precision counterpart on some indicators.

View paper on

Share this with someone who'll enjoy it:

Title:GSB: Group Superposition Binarization for Vision Transformer with Limited Training Samples

Paper and Code