Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Joel S. Emer

HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity

May 22, 2023

Yannan Nellie Wu, Po-An Tsai, Saurav Muralidharan, Angshuman Parashar, Vivienne Sze, Joel S. Emer

Figure 1 for HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity

Figure 2 for HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity

Figure 3 for HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity

Figure 4 for HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity

Abstract:Due to complex interactions among various deep neural network (DNN) optimization techniques, modern DNNs can have weights and activations that are dense or sparse with diverse sparsity degrees. To offer a good trade-off between accuracy and hardware performance, an ideal DNN accelerator should have high flexibility to efficiently translate DNN sparsity into reductions in energy and/or latency without incurring significant complexity overhead. This paper introduces hierarchical structured sparsity (HSS), with the key insight that we can systematically represent diverse sparsity degrees by having them hierarchically composed from multiple simple sparsity patterns. As a result, HSS simplifies the underlying hardware since it only needs to support simple sparsity patterns; this significantly reduces the sparsity acceleration overhead, which improves efficiency. Motivated by such opportunities, we propose a simultaneously efficient and flexible accelerator, named HighLight, to accelerate DNNs that have diverse sparsity degrees (including dense). Due to the flexibility of HSS, different HSS patterns can be introduced to DNNs to meet different applications' accuracy requirements. Compared to existing works, HighLight achieves a geomean of up to 6.4x better energy-delay product (EDP) across workloads with diverse sparsity degrees, and always sits on the EDP-accuracy Pareto frontier for representative DNNs.

* 12 pages, 17 figures

Via

Access Paper or Ask Questions

Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling

May 12, 2022

Yannan Nellie Wu, Po-An Tsai, Angshuman Parashar, Vivienne Sze, Joel S. Emer

Figure 1 for Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling

Figure 2 for Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling

Figure 3 for Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling

Figure 4 for Sparseloop: An Analytical Approach To Sparse Tensor Accelerator Modeling

Abstract:In recent years, many accelerators have been proposed to efficiently process sparse tensor algebra applications (e.g., sparse neural networks). However, these proposals are single points in a large and diverse design space. The lack of systematic description and modeling support for these sparse tensor accelerators impedes hardware designers from efficient and effective design space exploration. This paper first presents a unified taxonomy to systematically describe the diverse sparse tensor accelerator design space. Based on the proposed taxonomy, it then introduces Sparseloop, the first fast, accurate, and flexible analytical modeling framework to enable early-stage evaluation and exploration of sparse tensor accelerators. Sparseloop comprehends a large set of architecture specifications, including various dataflows and sparse acceleration features (e.g., elimination of zero-based compute). Using these specifications, Sparseloop evaluates a design's processing speed and energy efficiency while accounting for data movement and compute incurred by the employed dataflow as well as the savings and overhead introduced by the sparse acceleration features using stochastic tensor density models. Across representative accelerators and workloads, Sparseloop achieves over 2000 times faster modeling speed than cycle-level simulations, maintains relative performance trends, and achieves 0.1% to 8% average error. With a case study, we demonstrate Sparseloop's ability to help reveal important insights for designing sparse tensor accelerators (e.g., it is important to co-design orthogonal design aspects).

Via

Access Paper or Ask Questions