Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Patch-wise Mixed-Precision Quantization of Vision Transformer

May 11, 2023

Junrui Xiao, Zhikai Li, Lianwei Yang, Qingyi Gu

Figure 1 for Patch-wise Mixed-Precision Quantization of Vision Transformer

Figure 2 for Patch-wise Mixed-Precision Quantization of Vision Transformer

Figure 3 for Patch-wise Mixed-Precision Quantization of Vision Transformer

Figure 4 for Patch-wise Mixed-Precision Quantization of Vision Transformer

Share this with someone who'll enjoy it:

Abstract:As emerging hardware begins to support mixed bit-width arithmetic computation, mixed-precision quantization is widely used to reduce the complexity of neural networks. However, Vision Transformers (ViTs) require complex self-attention computation to guarantee the learning of powerful feature representations, which makes mixed-precision quantization of ViTs still challenging. In this paper, we propose a novel patch-wise mixed-precision quantization (PMQ) for efficient inference of ViTs. Specifically, we design a lightweight global metric, which is faster than existing methods, to measure the sensitivity of each component in ViTs to quantization errors. Moreover, we also introduce a pareto frontier approach to automatically allocate the optimal bit-precision according to the sensitivity. To further reduce the computational complexity of self-attention in inference stage, we propose a patch-wise module to reallocate bit-width of patches in each layer. Extensive experiments on the ImageNet dataset shows that our method greatly reduces the search cost and facilitates the application of mixed-precision quantization to ViTs.

View paper on

Share this with someone who'll enjoy it:

Title:Patch-wise Mixed-Precision Quantization of Vision Transformer

Paper and Code