Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Streamlined Deployment for Quantized Neural Networks

May 30, 2018

Yaman Umuroglu, Magnus Jahre

Figure 1 for Streamlined Deployment for Quantized Neural Networks

Figure 2 for Streamlined Deployment for Quantized Neural Networks

Figure 3 for Streamlined Deployment for Quantized Neural Networks

Figure 4 for Streamlined Deployment for Quantized Neural Networks

Share this with someone who'll enjoy it:

Abstract:Running Deep Neural Network (DNN) models on devices with limited computational capability is a challenge due to large compute and memory requirements. Quantized Neural Networks (QNNs) have emerged as a potential solution to this problem, promising to offer most of the DNN accuracy benefits with much lower computational cost. However, harvesting these benefits on existing mobile CPUs is a challenge since operations on highly quantized datatypes are not natively supported in most instruction set architectures (ISAs). In this work, we first describe a streamlining flow to convert all QNN inference operations to integer ones. Afterwards, we provide techniques based on processing one bit position at a time (bit-serial) to show how QNNs can be efficiently deployed using common bitwise operations. We demonstrate the potential of QNNs on mobile CPUs with microbenchmarks and on a quantized AlexNet, which is 3.5x faster than an optimized 8-bit baseline. Our bit-serial matrix multiplication library is available on GitHub at https://git.io/vhshn

* Presented at the International Workshop on Highly Efficient Neural Networks Design (HENND) co-located with CASES'17

View paper on

Share this with someone who'll enjoy it:

Title:Streamlined Deployment for Quantized Neural Networks

Paper and Code