Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization

Nov 07, 2018

H. T. Kung, Bradley McDanel, Sai Qian Zhang

Figure 1 for Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization

Figure 2 for Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization

Figure 3 for Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization

Figure 4 for Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization

Share this with someone who'll enjoy it:

Abstract:This paper describes a novel approach of packing sparse convolutional neural networks for their efficient systolic array implementations. By combining subsets of columns in the original filter matrix associated with a convolutional layer, we increase the utilization efficiency of the systolic array substantially (e.g., ~4x) due to the increased density of nonzeros in the resulting packed filter matrix. In combining columns, for each row, all filter weights but one with the largest magnitude are pruned. We retrain the remaining weights to preserve high accuracy. We demonstrate that in mitigating data privacy concerns the retraining can be accomplished with only fractions of the original dataset (e.g., 10\% for CIFAR-10). We study the effectiveness of this joint optimization for both high utilization and classification accuracy with ASIC and FPGA designs based on efficient bit-serial implementations of multiplier-accumulators. We present analysis and empirical evidence on the superior performance of our column combining approach against prior arts under metrics such as energy efficiency (3x) and inference latency (12x).

* To appear in ASPLOS 2019

View paper on

Share this with someone who'll enjoy it:

Title:Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization

Paper and Code