Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers

Jul 07, 2023

Lakshmi Nair, Mikhail Bernadskiy, Arulselvan Madhavan, Craig Chan, Ayon Basumallik, Darius Bunandar

Share this with someone who'll enjoy it:

Abstract:The recent rise of large language models (LLMs) has resulted in increased efforts towards running LLMs at reduced precision. Running LLMs at lower precision supports resource constraints and furthers their democratization, enabling users to run billion-parameter LLMs on their personal devices. To supplement this ongoing effort, we propose INT-FP-QSim: an open-source simulator that enables flexible evaluation of LLMs and vision transformers at various numerical precisions and formats. INT-FP-QSim leverages existing open-source repositories such as TensorRT, QPytorch and AIMET for a combined simulator that supports various floating point and integer formats. With the help of our simulator, we survey the impact of different numerical formats on the performance of LLMs and vision transformers at 4-bit weights and 4-bit or 8-bit activations. We also compare recently proposed methods like Adaptive Block Floating Point, SmoothQuant, GPTQ and RPTQ on the model performances. We hope INT-FP-QSim will enable researchers to flexibly simulate models at various precisions to support further research in quantization of LLMs and vision transformers.

* This report is supplementary material to the open-source code available at: https://github.com/lightmatter-ai/INT-FP-QSim

View paper on

Share this with someone who'll enjoy it:

Title:INT-FP-QSim: Mixed Precision and Formats For Large Language Models and Vision Transformers

Paper and Code