Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Guy Lemieux

Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training

Sep 23, 2020

Dingqing Yang, Amin Ghasemazar, Xiaowei Ren, Maximilian Golub, Guy Lemieux, Mieszko Lis

Figure 1 for Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training

Figure 2 for Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training

Figure 3 for Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training

Figure 4 for Procrustes: a Dataflow and Accelerator for Sparse Deep Neural Network Training

Abstract:The success of DNN pruning has led to the development of energy-efficient inference accelerators that support pruned models with sparse weight and activation tensors. Because the memory layouts and dataflows in these architectures are optimized for the access patterns during $\mathit{inference}$, however, they do not efficiently support the emerging sparse $\mathit{training}$ techniques. In this paper, we demonstrate (a) that accelerating sparse training requires a co-design approach where algorithms are adapted to suit the constraints of hardware, and (b) that hardware for sparse DNN training must tackle constraints that do not arise in inference accelerators. As proof of concept, we adapt a sparse training algorithm to be amenable to hardware acceleration; we then develop dataflow, data layout, and load-balancing techniques to accelerate it. The resulting system is a sparse DNN training accelerator that produces pruned models with the same accuracy as dense models without first training, then pruning, and finally retraining, a dense model. Compared to training the equivalent unpruned models using a state-of-the-art DNN accelerator without sparse training support, Procrustes consumes up to 3.26$\times$ less energy and offers up to 4$\times$ speedup across a range of models, while pruning weights by an order of magnitude and maintaining unpruned accuracy.

* Appears in the Proceedings of the 53$^\mathit{rd}$ IEEE/ACM International Symposium on Microarchitecture (MICRO 2020)

Via

Access Paper or Ask Questions

DropBack: Continuous Pruning During Training

Jun 11, 2018

Maximilian Golub, Guy Lemieux, Mieszko Lis

Figure 1 for DropBack: Continuous Pruning During Training

Figure 2 for DropBack: Continuous Pruning During Training

Figure 3 for DropBack: Continuous Pruning During Training

Figure 4 for DropBack: Continuous Pruning During Training

Abstract:We introduce a technique that compresses deep neural networks both during and after training by constraining the total number of weights updated during backpropagation to those with the highest total gradients. The remaining weights are forgotten and their initial value is regenerated at every access to avoid storing them in memory. This dramatically reduces the number of off-chip memory accesses during both training and inference, a key component of the energy needs of DNN accelerators. By ensuring that the total weight diffusion remains close to that of baseline unpruned SGD, networks pruned using DropBack are able to maintain high accuracy across network architectures. We observe weight compression of 25x with LeNet-300-100 on MNIST while maintaining accuracy. On CIFAR-10, we see an approximately 5x weight compression on 3 models: an already 9x-reduced VGG-16, Densenet, and WRN-28-10 - all with zero or negligible accuracy loss. On Densenet and WRN, which are particularly challenging to compress, Both Densenet and WRN improve on the state of the art, achieving higher compression with better accuracy than prior pruning techniques.

Via

Access Paper or Ask Questions