Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tailin Liang

Pruning and Quantization for Deep Neural Network Acceleration: A Survey

Jan 24, 2021

Tailin Liang, John Glossner, Lei Wang, Shaobo Shi

Figure 1 for Pruning and Quantization for Deep Neural Network Acceleration: A Survey

Figure 2 for Pruning and Quantization for Deep Neural Network Acceleration: A Survey

Figure 3 for Pruning and Quantization for Deep Neural Network Acceleration: A Survey

Figure 4 for Pruning and Quantization for Deep Neural Network Acceleration: A Survey

Abstract:Deep neural networks have been applied in many applications exhibiting extraordinary abilities in the field of computer vision. However, complex network architectures challenge efficient real-time deployment and require significant computation resources and energy costs. These challenges can be overcome through optimizations such as network compression. This paper provides a survey on two types of network compression: pruning and quantization. We compare current techniques, analyze their strengths and weaknesses, provide guidance for compressing networks, and discuss possible future compression techniques.

Via

Access Paper or Ask Questions

Dynamic Runtime Feature Map Pruning

Dec 24, 2018

Tailin Liang, Lei Wang, Shaobo Shi, John Glossner

Figure 1 for Dynamic Runtime Feature Map Pruning

Figure 2 for Dynamic Runtime Feature Map Pruning

Figure 3 for Dynamic Runtime Feature Map Pruning

Figure 4 for Dynamic Runtime Feature Map Pruning

Abstract:High bandwidth requirements are an obstacle for accelerating the training and inference of deep neural networks. Most previous research focuses on reducing the size of kernel maps for inference. We analyze parameter sparsity of six popular convolutional neural networks - AlexNet, MobileNet, ResNet-50, SqueezeNet, TinyNet, and VGG16. Of the networks considered, those using ReLU (AlexNet, SqueezeNet, VGG16) contain a high percentage of 0-valued parameters and can be statically pruned. Networks with Non-ReLU activation functions in some cases may not contain any 0-valued parameters (ResNet-50, TinyNet). We also investigate runtime feature map usage and find that input feature maps comprise the majority of bandwidth requirements when depth-wise convolution and point-wise convolutions used. We introduce dynamic runtime pruning of feature maps and show that 10% of dynamic feature map execution can be removed without loss of accuracy. We then extend dynamic pruning to allow for values within an epsilon of zero and show a further 5% reduction of feature map loading with a 1% loss of accuracy in top-1.

Via

Access Paper or Ask Questions