Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dong Hyun Lee

NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference

Dec 03, 2021

Joonsang Yu, Junki Park, Seongmin Park, Minsoo Kim, Sihwa Lee, Dong Hyun Lee, Jungwook Choi

Figure 1 for NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference

Figure 2 for NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference

Figure 3 for NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference

Figure 4 for NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference

Abstract:Non-linear operations such as GELU, Layer normalization, and Softmax are essential yet costly building blocks of Transformer models. Several prior works simplified these operations with look-up tables or integer computations, but such approximations suffer inferior accuracy or considerable hardware cost with long latency. This paper proposes an accurate and hardware-friendly approximation framework for efficient Transformer inference. Our framework employs a simple neural network as a universal approximator with its structure equivalently transformed into a LUT. The proposed framework called NN-LUT can accurately replace all the non-linear operations in popular BERT models with significant reductions in area, power consumption, and latency.

* 7 pages, 3 figures

Via

Access Paper or Ask Questions