Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Skirmantas Kligys

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Dec 15, 2017

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko

Figure 1 for Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Figure 2 for Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Figure 3 for Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Figure 4 for Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

Abstract:The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. We propose a quantization scheme that allows inference to be carried out using integer-only arithmetic, which can be implemented more efficiently than floating point inference on commonly available integer-only hardware. We also co-design a training procedure to preserve end-to-end model accuracy post quantization. As a result, the proposed quantization scheme improves the tradeoff between accuracy and on-device latency. The improvements are significant even on MobileNets, a model family known for run-time efficiency, and are demonstrated in ImageNet classification and COCO detection on popular CPUs.

* 14 pages, 12 figures

Via

Access Paper or Ask Questions