Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andrey Melnikov

Quantization-Guided Training for Compact TinyML Models

Mar 10, 2021

Sedigh Ghamari, Koray Ozcan, Thu Dinh, Andrey Melnikov, Juan Carvajal, Jan Ernst, Sek Chai

Figure 1 for Quantization-Guided Training for Compact TinyML Models

Figure 2 for Quantization-Guided Training for Compact TinyML Models

Figure 3 for Quantization-Guided Training for Compact TinyML Models

Figure 4 for Quantization-Guided Training for Compact TinyML Models

Abstract:We propose a Quantization Guided Training (QGT) method to guide DNN training towards optimized low-bit-precision targets and reach extreme compression levels below 8-bit precision. Unlike standard quantization-aware training (QAT) approaches, QGT uses customized regularization to encourage weight values towards a distribution that maximizes accuracy while reducing quantization errors. One of the main benefits of this approach is the ability to identify compression bottlenecks. We validate QGT using state-of-the-art model architectures on vision datasets. We also demonstrate the effectiveness of QGT with an 81KB tiny model for person detection down to 2-bit precision (representing 17.7x size reduction), while maintaining an accuracy drop of only 3% compared to a floating-point baseline.

* TinyML Summit, March 2021

Via

Access Paper or Ask Questions

Subtensor Quantization for Mobilenets

Nov 04, 2020

Thu Dinh, Andrey Melnikov, Vasilios Daskalopoulos, Sek Chai

Figure 1 for Subtensor Quantization for Mobilenets

Figure 2 for Subtensor Quantization for Mobilenets

Abstract:Quantization for deep neural networks (DNN) have enabled developers to deploy models with less memory and more efficient low-power inference. However, not all DNN designs are friendly to quantization. For example, the popular Mobilenet architecture has been tuned to reduce parameter size and computational latency with separable depth-wise convolutions, but not all quantization algorithms work well and the accuracy can suffer against its float point versions. In this paper, we analyzed several root causes of quantization loss and proposed alternatives that do not rely on per-channel or training-aware approaches. We evaluate the image classification task on ImageNet dataset, and our post-training quantized 8-bit inference top-1 accuracy in within 0.7% of the floating point version.

* Embedded Vision Workshop, 16th European Conference on Computer Vision (ECCV), Aug 2020

Via

Access Paper or Ask Questions