Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:BiT: Robustly Binarized Multi-distilled Transformer

May 25, 2022

Zechun Liu, Barlas Oguz, Aasish Pappu, Lin Xiao, Scott Yih, Meng Li, Raghuraman Krishnamoorthi, Yashar Mehdad

Figure 1 for BiT: Robustly Binarized Multi-distilled Transformer

Figure 2 for BiT: Robustly Binarized Multi-distilled Transformer

Figure 3 for BiT: Robustly Binarized Multi-distilled Transformer

Figure 4 for BiT: Robustly Binarized Multi-distilled Transformer

Share this with someone who'll enjoy it:

Abstract:Modern pre-trained transformers have rapidly advanced the state-of-the-art in machine learning, but have also grown in parameters and computational complexity, making them increasingly difficult to deploy in resource-constrained environments. Binarization of the weights and activations of the network can significantly alleviate these issues, however is technically challenging from an optimization perspective. In this work, we identify a series of improvements which enables binary transformers at a much higher accuracy than what was possible previously. These include a two-set binarization scheme, a novel elastic binary activation function with learned parameters, and a method to quantize a network to its limit by successively distilling higher precision models into lower precision students. These approaches allow for the first time, fully binarized transformer models that are at a practical level of accuracy, approaching a full-precision BERT baseline on the GLUE language understanding benchmark within as little as 5.9%.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:BiT: Robustly Binarized Multi-distilled Transformer

Paper and Code