Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Mixed Non-linear Quantization for Vision Transformers

Jul 26, 2024

Gihwan Kim, Jemin Lee, Sihyeong Park, Yongin Kwon, Hyungshin Kim

Figure 1 for Mixed Non-linear Quantization for Vision Transformers

Figure 2 for Mixed Non-linear Quantization for Vision Transformers

Figure 3 for Mixed Non-linear Quantization for Vision Transformers

Figure 4 for Mixed Non-linear Quantization for Vision Transformers

Share this with someone who'll enjoy it:

Abstract:The majority of quantization methods have been proposed to reduce the model size of Vision Transformers, yet most of them have overlooked the quantization of non-linear operations. Only a few works have addressed quantization for non-linear operations, but they applied a single quantization method across all non-linear operations. We believe that this can be further improved by employing a different quantization method for each non-linear operation. Therefore, to assign the most error-minimizing quantization method from the known methods to each non-linear layer, we propose a mixed non-linear quantization that considers layer-wise quantization sensitivity measured by SQNR difference metric. The results show that our method outperforms I-BERT, FQ-ViT, and I-ViT in both 8-bit and 6-bit settings for ViT, DeiT, and Swin models by an average of 0.6%p and 19.6%p, respectively. Our method outperforms I-BERT and I-ViT by 0.6%p and 20.8%p, respectively, when training time is limited. We plan to release our code at https://gitlab.com/ones-ai/mixed-non-linear-quantization.

* 16 pages, 4 figures, under review

View paper on

Share this with someone who'll enjoy it:

Title:Mixed Non-linear Quantization for Vision Transformers

Paper and Code