Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Levi Melnick

Microscaling Data Formats for Deep Learning

Oct 19, 2023

Bita Darvish Rouhani, Ritchie Zhao, Ankit More, Mathew Hall, Alireza Khodamoradi, Summer Deng, Dhruv Choudhary, Marius Cornea, Eric Dellinger, Kristof Denolf(+23 more)

Figure 1 for Microscaling Data Formats for Deep Learning

Figure 2 for Microscaling Data Formats for Deep Learning

Figure 3 for Microscaling Data Formats for Deep Learning

Figure 4 for Microscaling Data Formats for Deep Learning

Abstract:Narrow bit-width data formats are key to reducing the computational and storage costs of modern deep learning applications. This paper evaluates Microscaling (MX) data formats that combine a per-block scaling factor with narrow floating-point and integer types for individual elements. MX formats balance the competing needs of hardware efficiency, model accuracy, and user friction. Empirical results on over two dozen benchmarks demonstrate practicality of MX data formats as a drop-in replacement for baseline FP32 for AI inference and training with low user friction. We also show the first instance of training generative language models at sub-8-bit weights, activations, and gradients with minimal accuracy loss and no modifications to the training recipe.

Via

Access Paper or Ask Questions

Shared Microexponents: A Little Shifting Goes a Long Way

Feb 16, 2023

Bita Rouhani, Ritchie Zhao, Venmugil Elango, Rasoul Shafipour, Mathew Hall, Maral Mesmakhosroshahi, Ankit More, Levi Melnick, Maximilian Golub, Girish Varatkar(+12 more)

Figure 1 for Shared Microexponents: A Little Shifting Goes a Long Way

Figure 2 for Shared Microexponents: A Little Shifting Goes a Long Way

Figure 3 for Shared Microexponents: A Little Shifting Goes a Long Way

Figure 4 for Shared Microexponents: A Little Shifting Goes a Long Way

Abstract:This paper introduces Block Data Representations (BDR), a framework for exploring and evaluating a wide spectrum of narrow-precision formats for deep learning. It enables comparison of popular quantization standards, and through BDR, new formats based on shared microexponents (MX) are identified, which outperform other state-of-the-art quantization approaches, including narrow-precision floating-point and block floating-point. MX utilizes multiple levels of quantization scaling with ultra-fine scaling factors based on shared microexponents in the hardware. The effectiveness of MX is demonstrated on real-world models including large-scale generative pretraining and inferencing, and production-scale recommendation systems.

Via

Access Paper or Ask Questions