Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging

Apr 30, 2020

Shigang Li, Tal Ben-Nun, Dan Alistarh, Salvatore Di Girolamo, Nikoli Dryden, Torsten Hoefler

Figure 1 for Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging

Figure 2 for Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging

Figure 3 for Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging

Figure 4 for Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging

Share this with someone who'll enjoy it:

Abstract:Deep learning at scale is dominated by communication time. Distributing samples across nodes usually yields the best performance, but poses scaling challenges due to global information dissemination and load imbalance across uneven sample lengths. State-of-the-art decentralized optimizers mitigate the problem, but require more iterations to achieve the same accuracy as their globally-communicating counterparts. We present Wait-Avoiding Group Model Averaging (WAGMA) SGD, a wait-avoiding stochastic optimizer that reduces global communication via subgroup weight exchange. The key insight is a combination of algorithmic changes to the averaging scheme and the use of a group allreduce operation. We prove the convergence of WAGMA-SGD, and empirically show that it retains convergence rates equivalent to Allreduce-SGD. For evaluation, we train ResNet-50 on ImageNet; Transformer for machine translation; and deep reinforcement learning for navigation at scale. Compared with state-of-the-art decentralized SGD, WAGMA-SGD significantly improves training throughput (by 2.1x on 1,024 GPUs) and achieves the fastest time-to-solution.

View paper on

Share this with someone who'll enjoy it:

Title:Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging

Paper and Code