Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anand Jayarajan

FPRaker: A Processing Element For Accelerating Neural Network Training

Oct 15, 2020

Omar Mohamed Awad, Mostafa Mahmoud, Isak Edo, Ali Hadi Zadeh, Ciaran Bannon, Anand Jayarajan, Gennady Pekhimenko, Andreas Moshovos

Figure 1 for FPRaker: A Processing Element For Accelerating Neural Network Training

Figure 2 for FPRaker: A Processing Element For Accelerating Neural Network Training

Figure 3 for FPRaker: A Processing Element For Accelerating Neural Network Training

Figure 4 for FPRaker: A Processing Element For Accelerating Neural Network Training

Abstract:We present FPRaker, a processing element for composing training accelerators. FPRaker processes several floating-point multiply-accumulation operations concurrently and accumulates their result into a higher precision accumulator. FPRaker boosts performance and energy efficiency during training by taking advantage of the values that naturally appear during training. Specifically, it processes the significand of the operands of each multiply-accumulate as a series of signed powers of two. The conversion to this form is done on-the-fly. This exposes ineffectual work that can be skipped: values when encoded have few terms and some of them can be discarded as they would fall outside the range of the accumulator given the limited precision of floating-point. We demonstrate that FPRaker can be used to compose an accelerator for training and that it can improve performance and energy efficiency compared to using conventional floating-point units under ISO-compute area constraints. We also demonstrate that FPRaker delivers additional benefits when training incorporates pruning and quantization. Finally, we show that FPRaker naturally amplifies performance with training methods that use a different precision per layer.

Via

Access Paper or Ask Questions

Priority-based Parameter Propagation for Distributed DNN Training

May 10, 2019

Anand Jayarajan, Jinliang Wei, Garth Gibson, Alexandra Fedorova, Gennady Pekhimenko

Figure 1 for Priority-based Parameter Propagation for Distributed DNN Training

Figure 2 for Priority-based Parameter Propagation for Distributed DNN Training

Figure 3 for Priority-based Parameter Propagation for Distributed DNN Training

Figure 4 for Priority-based Parameter Propagation for Distributed DNN Training

Abstract:Data parallel training is widely used for scaling distributed deep neural network (DNN) training. However, the performance benefits are often limited by the communication-heavy parameter synchronization step. In this paper, we take advantage of the domain specific knowledge of DNN training and overlap parameter synchronization with computation in order to improve the training performance. We make two key observations: (1) the optimal data representation granularity for the communication may differ from that used by the underlying DNN model implementation and (2) different parameters can afford different synchronization delays. Based on these observations, we propose a new synchronization mechanism called Priority-based Parameter Propagation (P3). P3 synchronizes parameters at a finer granularity and schedules data transmission in such a way that the training process incurs minimal communication delay. We show that P3 can improve the training throughput of ResNet-50, Sockeye and VGG-19 by as much as 25%, 38% and 66% respectively on clusters with realistic network bandwidth

* In proceedings of the 2nd SysML Conference 2019

Via

Access Paper or Ask Questions