Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vijayalakshmi Srinivasan

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

Apr 21, 2021

Chia-Yu Chen, Jiamin Ni, Songtao Lu, Xiaodong Cui, Pin-Yu Chen, Xiao Sun, Naigang Wang, Swagath Venkataramani, Vijayalakshmi Srinivasan, Wei Zhang(+1 more)

Figure 1 for ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

Figure 2 for ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

Figure 3 for ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

Figure 4 for ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

Abstract:Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms is expected to be severely communication constrained. To overcome this limitation, numerous gradient compression techniques have been proposed and have demonstrated high compression ratios. However, most existing methods do not scale well to large scale distributed systems (due to gradient build-up) and/or fail to evaluate model fidelity (test accuracy) on large datasets. To mitigate these issues, we propose a new compression technique, Scalable Sparsified Gradient Compression (ScaleCom), that leverages similarity in the gradient distribution amongst learners to provide significantly improved scalability. Using theoretical analysis, we show that ScaleCom provides favorable convergence guarantees and is compatible with gradient all-reduce techniques. Furthermore, we experimentally demonstrate that ScaleCom has small overheads, directly reduces gradient traffic and provides high compression rates (65-400X) and excellent scalability (up to 64 learners and 8-12X larger batch sizes over standard training) across a wide range of applications (image, language, and speech) without significant accuracy loss.

* NeurIPS2020 accepted https://proceedings.neurips.cc/paper/2020/hash/9d58963592071dbf38a0fa114269959c-Abstract.html

Via

Access Paper or Ask Questions

Bridging the Accuracy Gap for 2-bit Quantized Neural Networks

Jul 17, 2018

Jungwook Choi, Pierce I-Jen Chuang, Zhuo Wang, Swagath Venkataramani, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan

Figure 1 for Bridging the Accuracy Gap for 2-bit Quantized Neural Networks

Figure 2 for Bridging the Accuracy Gap for 2-bit Quantized Neural Networks

Figure 3 for Bridging the Accuracy Gap for 2-bit Quantized Neural Networks

Figure 4 for Bridging the Accuracy Gap for 2-bit Quantized Neural Networks

Abstract:Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost. In order to reduce this cost, several quantization schemes have gained attention recently with some focusing on weight quantization, and others focusing on quantizing activations. This paper proposes novel techniques that target weight and activation quantizations separately resulting in an overall quantized neural network (QNN). The activation quantization technique, PArameterized Clipping acTivation (PACT), uses an activation clipping parameter $\alpha$ that is optimized during training to find the right quantization scale. The weight quantization scheme, statistics-aware weight binning (SAWB), finds the optimal scaling factor that minimizes the quantization error based on the statistical characteristics of the distribution of weights without the need for an exhaustive search. The combination of PACT and SAWB results in a 2-bit QNN that achieves state-of-the-art classification accuracy (comparable to full precision networks) across a range of popular models and datasets.

* arXiv admin note: substantial text overlap with arXiv:1805.06085

Via

Access Paper or Ask Questions

PACT: Parameterized Clipping Activation for Quantized Neural Networks

Jul 17, 2018

Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan

Figure 1 for PACT: Parameterized Clipping Activation for Quantized Neural Networks

Figure 2 for PACT: Parameterized Clipping Activation for Quantized Neural Networks

Figure 3 for PACT: Parameterized Clipping Activation for Quantized Neural Networks

Figure 4 for PACT: Parameterized Clipping Activation for Quantized Neural Networks

Abstract:Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost. To address this cost, a number of quantization schemes have been proposed - but most of these techniques focused on quantizing weights, which are relatively smaller in size compared to activations. This paper proposes a novel quantization scheme for activations during training - that enables neural networks to work well with ultra low precision weights and activations without any significant accuracy degradation. This technique, PArameterized Clipping acTivation (PACT), uses an activation clipping parameter $\alpha$ that is optimized during training to find the right quantization scale. PACT allows quantizing activations to arbitrary bit precisions, while achieving much better accuracy relative to published state-of-the-art quantization schemes. We show, for the first time, that both weights and activations can be quantized to 4-bits of precision while still achieving accuracy comparable to full precision networks across a range of popular models and datasets. We also show that exploiting these reduced-precision computational units in hardware can enable a super-linear improvement in inferencing performance due to a significant reduction in the area of accelerator compute engines coupled with the ability to retain the quantized model and activation data in on-chip memories.

Via

Access Paper or Ask Questions