Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Efficient Distributed Auto-Differentiation

Feb 22, 2021

Bradley T. Baker, Vince D. Calhoun, Barak Pearlmutter, Sergey M. Plis

Figure 1 for Efficient Distributed Auto-Differentiation

Figure 2 for Efficient Distributed Auto-Differentiation

Figure 3 for Efficient Distributed Auto-Differentiation

Figure 4 for Efficient Distributed Auto-Differentiation

Share this with someone who'll enjoy it:

Abstract:Although distributed machine learning has opened up numerous frontiers of research, the separation of large models across different devices, nodes, and sites can invite significant communication overhead, making reliable training difficult. The focus on gradients as the primary shared statistic during training has led to a number of intuitive algorithms for distributed deep learning; however, gradient-based algorithms for training large deep neural networks (DNNs) are communication-heavy, often requiring additional modifications via sparsity constraints, compression, quantization, and other similar approaches, to lower bandwidth. We introduce a surprisingly simple statistic for training distributed DNNs that is more communication-friendly than the gradient. The error backpropagation process can be modified to share these smaller intermediate values instead of the gradient, reducing communication overhead with no impact on accuracy. The process provides the flexibility of averaging gradients during backpropagation, enabling novel flexible training schemas while leaving room for further bandwidth reduction via existing gradient compression methods. Finally, consideration of the matrices used to compute the gradient inspires a new approach to compression via structured power iterations, which can not only reduce bandwidth but also enable introspection into distributed training dynamics, without significant performance loss.

* 8 pages, 6 figures

View paper on

Share this with someone who'll enjoy it:

Title:Efficient Distributed Auto-Differentiation

Paper and Code