Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding

Jun 01, 2023

Zi Yang, Samridhi Choudhary, Siegfried Kunzmann, Zheng Zhang

Figure 1 for Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding

Figure 2 for Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding

Figure 3 for Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding

Figure 4 for Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding

Share this with someone who'll enjoy it:

Abstract:Fine-tuned transformer models have shown superior performances in many natural language tasks. However, the large model size prohibits deploying high-performance transformer models on resource-constrained devices. This paper proposes a quantization-aware tensor-compressed training approach to reduce the model size, arithmetic operations, and ultimately runtime latency of transformer-based models. We compress the embedding and linear layers of transformers into small low-rank tensor cores, which significantly reduces model parameters. A quantization-aware training with learnable scale factors is used to further obtain low-precision representations of the tensor-compressed models. The developed approach can be used for both end-to-end training and distillation-based training. To improve the convergence, a layer-by-layer distillation is applied to distill a quantized and tensor-compressed student model from a pre-trained transformer. The performance is demonstrated in two natural language understanding tasks, showing up to $63\times$ compression ratio, little accuracy loss and remarkable inference and training speedup.

View paper on

Share this with someone who'll enjoy it:

Title:Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding

Paper and Code