Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:S4: a High-sparsity, High-performance AI Accelerator

Jul 16, 2022

Ian En-Hsu Yen, Zhibin Xiao, Dongkuan Xu

Figure 1 for S4: a High-sparsity, High-performance AI Accelerator

Figure 2 for S4: a High-sparsity, High-performance AI Accelerator

Figure 3 for S4: a High-sparsity, High-performance AI Accelerator

Figure 4 for S4: a High-sparsity, High-performance AI Accelerator

Share this with someone who'll enjoy it:

Abstract:Exploiting sparsity underlying neural networks has become one of the most potential methodologies to reduce the memory footprint, I/O cost, and computation workloads during inference. And the degree of sparsity one can exploit has become higher as larger model sizes have been considered along with the trend of pre-training giant models. On the other hand, compared with quantization that has been a widely supported option, acceleration through high-degree sparsity is not supported in most computing platforms. In this work, we introduce the first commercial hardware platform supporting high-degree sparsity acceleration up to 32 times -- S4. Combined with state-of-the-art sparse pruning techniques, we demonstrate several-times practical inference speedup on S4 over mainstream inference platforms such as Nvidia T4. We also show that in practice a sparse model of larger size can achieve both higher accuracy and higher throughput on S4 than a dense model of smaller size.

* 7 pages, 3 figures, SNN Workshop 2022

View paper on

Share this with someone who'll enjoy it:

Title:S4: a High-sparsity, High-performance AI Accelerator

Paper and Code