Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lianwei Yang

TTAQ: Towards Stable Post-training Quantization in Continuous Domain Adaptation

Dec 13, 2024

Junrui Xiao, Zhikai Li, Lianwei Yang, Yiduo Mei, Qingyi Gu

Figure 1 for TTAQ: Towards Stable Post-training Quantization in Continuous Domain Adaptation

Figure 2 for TTAQ: Towards Stable Post-training Quantization in Continuous Domain Adaptation

Figure 3 for TTAQ: Towards Stable Post-training Quantization in Continuous Domain Adaptation

Figure 4 for TTAQ: Towards Stable Post-training Quantization in Continuous Domain Adaptation

Abstract:Post-training quantization (PTQ) reduces excessive hardware cost by quantizing full-precision models into lower bit representations on a tiny calibration set, without retraining. Despite the remarkable progress made through recent efforts, traditional PTQ methods typically encounter failure in dynamic and ever-changing real-world scenarios, involving unpredictable data streams and continual domain shifts, which poses greater challenges. In this paper, we propose a novel and stable quantization process for test-time adaptation (TTA), dubbed TTAQ, to address the performance degradation of traditional PTQ in dynamically evolving test domains. To tackle domain shifts in quantizer, TTAQ proposes the Perturbation Error Mitigation (PEM) and Perturbation Consistency Reconstruction (PCR). Specifically, PEM analyzes the error propagation and devises a weight regularization scheme to mitigate the impact of input perturbations. On the other hand, PCR introduces consistency learning to ensure that quantized models provide stable predictions for same sample. Furthermore, we introduce Adaptive Balanced Loss (ABL) to adjust the logits by taking advantage of the frequency and complexity of the class, which can effectively address the class imbalance caused by unpredictable data streams during optimization. Extensive experiments are conducted on multiple datasets with generic TTA methods, proving that TTAQ can outperform existing baselines and encouragingly improve the accuracy of low bit PTQ models in continually changing test domains. For instance, TTAQ decreases the mean error of 2-bit models on ImageNet-C dataset by an impressive 10.1\%.

Via

Access Paper or Ask Questions

DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers

Aug 06, 2024

Lianwei Yang, Haisong Gong

Figure 1 for DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers

Figure 2 for DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers

Figure 3 for DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers

Figure 4 for DopQ-ViT: Towards Distribution-Friendly and Outlier-Aware Post-Training Quantization for Vision Transformers

Abstract:Vision transformers (ViTs) have garnered significant attention for their performance in vision tasks; however, the high computational cost and significant latency issues have hinder widespread adoption. Post-training quantization (PTQ), a promising method for model compression, still faces accuracy degradation challenges with ViTs. There are two reasons for this: the existing quantization paradigm does not fit the power-law distribution of post-Softmax activations well, and accuracy inevitably decreases after reparameterizing post-LayerNorm activations. We propose a Distribution-Friendly and Outlier-Aware Post-training Quantization method for Vision Transformers, named DopQ-ViT. DopQ-ViT analyzes the inefficiencies of current quantizers and introduces a distribution-friendly Tan Quantizer called TanQ. TanQ focuses more on values near 1, more accurately preserving the power-law distribution of post-Softmax activations, and achieves favorable results. Moreover, when reparameterizing post-LayerNorm activations from channel-wise to layer-wise quantization, the accuracy degradation is mainly due to the significant impact of outliers in the scaling factors. Therefore, DopQ-ViT proposes a method to Search for the Optimal Scaling Factor, denoted as SOSF, which compensates for the influence of outliers and preserves the performance of the quantization model. DopQ-ViT has undergone extensive validation and demonstrates significant performance improvements in quantization models, particularly in low-bit settings.

Via

Access Paper or Ask Questions

MGRQ: Post-Training Quantization For Vision Transformer With Mixed Granularity Reconstruction

Jun 13, 2024

Lianwei Yang, Zhikai Li, Junrui Xiao, Haisong Gong, Qingyi Gu

Figure 1 for MGRQ: Post-Training Quantization For Vision Transformer With Mixed Granularity Reconstruction

Figure 2 for MGRQ: Post-Training Quantization For Vision Transformer With Mixed Granularity Reconstruction

Figure 3 for MGRQ: Post-Training Quantization For Vision Transformer With Mixed Granularity Reconstruction

Figure 4 for MGRQ: Post-Training Quantization For Vision Transformer With Mixed Granularity Reconstruction

Abstract:Post-training quantization (PTQ) efficiently compresses vision models, but unfortunately, it accompanies a certain degree of accuracy degradation. Reconstruction methods aim to enhance model performance by narrowing the gap between the quantized model and the full-precision model, often yielding promising results. However, efforts to significantly improve the performance of PTQ through reconstruction in the Vision Transformer (ViT) have shown limited efficacy. In this paper, we conduct a thorough analysis of the reasons for this limited effectiveness and propose MGRQ (Mixed Granularity Reconstruction Quantization) as a solution to address this issue. Unlike previous reconstruction schemes, MGRQ introduces a mixed granularity reconstruction approach. Specifically, MGRQ enhances the performance of PTQ by introducing Extra-Block Global Supervision and Intra-Block Local Supervision, building upon Optimized Block-wise Reconstruction. Extra-Block Global Supervision considers the relationship between block outputs and the model's output, aiding block-wise reconstruction through global supervision. Meanwhile, Intra-Block Local Supervision reduces generalization errors by aligning the distribution of outputs at each layer within a block. Subsequently, MGRQ is further optimized for reconstruction through Mixed Granularity Loss Fusion. Extensive experiments conducted on various ViT models illustrate the effectiveness of MGRQ. Notably, MGRQ demonstrates robust performance in low-bit quantization, thereby enhancing the practicality of the quantized model.

* Accepted by 2024 IEEE International Conference on Image Processing

Via

Access Paper or Ask Questions

BinaryViT: Towards Efficient and Accurate Binary Vision Transformers

May 24, 2023

Junrui Xiao, Zhikai Li, Lianwei Yang, Qingyi Gu

Abstract:Vision Transformers (ViTs) have emerged as the fundamental architecture for most computer vision fields, but the considerable memory and computation costs hinders their application on resource-limited devices. As one of the most powerful compression methods, binarization reduces the computation of the neural network by quantizing the weights and activation values as $\pm$1. Although existing binarization methods have demonstrated excellent performance on Convolutional Neural Networks (CNNs), the full binarization of ViTs is still under-studied and suffering a significant performance drop. In this paper, we first argue empirically that the severe performance degradation is mainly caused by the weight oscillation in the binarization training and the information distortion in the activation of ViTs. Based on these analyses, we propose $\textbf{BinaryViT}$, an accurate full binarization scheme for ViTs, which pushes the quantization of ViTs to the limit. Specifically, we propose a novel gradient regularization scheme (GRS) for driving a bimodal distribution of the weights to reduce oscillation in binarization training. Moreover, we design an activation shift module (ASM) to adaptively tune the activation distribution to reduce the information distortion caused by binarization. Extensive experiments on ImageNet dataset show that our BinaryViT consistently surpasses the strong baseline by 2.05% and improve the accuracy of fully binarized ViTs to a usable level. Furthermore, our method achieves impressive savings of 16.2$\times$ and 17.7$\times$ in model size and OPs compared to the full-precision DeiT-S. The codes and models will be released on github.

Via

Access Paper or Ask Questions

Patch-wise Mixed-Precision Quantization of Vision Transformer

May 11, 2023

Junrui Xiao, Zhikai Li, Lianwei Yang, Qingyi Gu

Figure 1 for Patch-wise Mixed-Precision Quantization of Vision Transformer

Figure 2 for Patch-wise Mixed-Precision Quantization of Vision Transformer

Figure 3 for Patch-wise Mixed-Precision Quantization of Vision Transformer

Figure 4 for Patch-wise Mixed-Precision Quantization of Vision Transformer

Abstract:As emerging hardware begins to support mixed bit-width arithmetic computation, mixed-precision quantization is widely used to reduce the complexity of neural networks. However, Vision Transformers (ViTs) require complex self-attention computation to guarantee the learning of powerful feature representations, which makes mixed-precision quantization of ViTs still challenging. In this paper, we propose a novel patch-wise mixed-precision quantization (PMQ) for efficient inference of ViTs. Specifically, we design a lightweight global metric, which is faster than existing methods, to measure the sensitivity of each component in ViTs to quantization errors. Moreover, we also introduce a pareto frontier approach to automatically allocate the optimal bit-precision according to the sensitivity. To further reduce the computational complexity of self-attention in inference stage, we propose a patch-wise module to reallocate bit-width of patches in each layer. Extensive experiments on the ImageNet dataset shows that our method greatly reduces the search cost and facilitates the application of mixed-precision quantization to ViTs.

Via

Access Paper or Ask Questions

RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers

Dec 16, 2022

Zhikai Li, Junrui Xiao, Lianwei Yang, Qingyi Gu

Figure 1 for RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers

Figure 2 for RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers

Figure 3 for RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers

Figure 4 for RepQ-ViT: Scale Reparameterization for Post-Training Quantization of Vision Transformers

Abstract:Post-training quantization (PTQ), which only requires a tiny dataset for calibration without end-to-end retraining, is a light and practical model compression technique. Recently, several PTQ schemes for vision transformers (ViTs) have been presented; unfortunately, they typically suffer from non-trivial accuracy degradation, especially in low-bit cases. In this paper, we propose RepQ-ViT, a novel PTQ framework for ViTs based on quantization scale reparameterization, to address the above issues. RepQ-ViT decouples the quantization and inference processes, where the former employs complex quantizers and the latter employs scale-reparameterized simplified quantizers. This ensures both accurate quantization and efficient inference, which distinguishes it from existing approaches that sacrifice quantization performance to meet the target hardware. More specifically, we focus on two components with extreme distributions: post-LayerNorm activations with severe inter-channel variation and post-Softmax activations with power-law features, and initially apply channel-wise quantization and log$\sqrt{2}$ quantization, respectively. Then, we reparameterize the scales to hardware-friendly layer-wise quantization and log2 quantization for inference, with only slight accuracy or computational costs. Extensive experiments are conducted on multiple vision tasks with different model variants, proving that RepQ-ViT, without hyperparameters and expensive reconstruction procedures, can outperform existing strong baselines and encouragingly improve the accuracy of 4-bit PTQ of ViTs to a usable level.

Via

Access Paper or Ask Questions