Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Miloš Nikolić

Schrödinger's FP: Dynamic Adaptation of Floating-Point Containers for Deep Learning Training

Apr 28, 2022

Miloš Nikolić, Enrique Torres Sanchez, Jiahui Wang, Ali Hadi Zadeh, Mostafa Mahmoud, Ameer Abdelhadi, Andreas Moshovos

Figure 1 for Schrödinger's FP: Dynamic Adaptation of Floating-Point Containers for Deep Learning Training

Figure 2 for Schrödinger's FP: Dynamic Adaptation of Floating-Point Containers for Deep Learning Training

Figure 3 for Schrödinger's FP: Dynamic Adaptation of Floating-Point Containers for Deep Learning Training

Figure 4 for Schrödinger's FP: Dynamic Adaptation of Floating-Point Containers for Deep Learning Training

Abstract:We introduce a software-hardware co-design approach to reduce memory traffic and footprint during training with BFloat16 or FP32 boosting energy efficiency and execution time performance. We introduce methods to dynamically adjust the size and format of the floating-point containers used to store activations and weights during training. The different value distributions lead us to different approaches for exponents and mantissas. Gecko exploits the favourable exponent distribution with a loss-less delta encoding approach to reduce the total exponent footprint by up to $58\%$ in comparison to a 32 bit floating point baseline. To content with the noisy mantissa distributions, we present two lossy methods to eliminate as many as possible least significant bits while not affecting accuracy. Quantum Mantissa, is a machine learning-first mantissa compression method that taps on training's gradient descent algorithm to also learn minimal mantissa bitlengths on a per-layer granularity, and obtain up to $92\%$ reduction in total mantissa footprint. Alternatively, BitChop observes changes in the loss function during training to adjust mantissa bit-length network-wide yielding a reduction of $81\%$ in footprint. Schr\"{o}dinger's FP implements hardware encoders/decoders that guided by Gecko/Quantum Mantissa or Gecko/BitChop transparently encode/decode values when transferring to/from off-chip memory boosting energy efficiency and reducing execution time.

Via

Access Paper or Ask Questions

BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization

Feb 08, 2020

Miloš Nikolić, Ghouthi Boukli Hacene, Ciaran Bannon, Alberto Delmas Lascorz, Matthieu Courbariaux, Yoshua Bengio, Vincent Gripon, Andreas Moshovos

Figure 1 for BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization

Figure 2 for BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization

Figure 3 for BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization

Figure 4 for BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization

Abstract:Neural networks have demonstrably achieved state-of-the art accuracy using low-bitlength integer quantization, yielding both execution time and energy benefits on existing hardware designs that support short bitlengths. However, the question of finding the minimum bitlength for a desired accuracy remains open. We introduce a training method for minimizing inference bitlength at any granularity while maintaining accuracy. Furthermore, we propose a regularizer that penalizes large bitlength representations throughout the architecture and show how it can be modified to minimize other quantifiable criteria, such as number of operations or memory footprint. We demonstrate that our method learns thrifty representations while maintaining accuracy. With ImageNet, the method produces an average per layer bitlength of 4.13 and 3.76 bits on AlexNet and ResNet18 respectively, remaining within 2.0% and 0.5% of the baseline TOP-1 accuracy.

Via

Access Paper or Ask Questions