Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mark Horeni

Improvements in Interlayer Pipelining of CNN Accelerators Using Genetic Algorithms

Nov 20, 2023

Mark Horeni, Siddharth Joshi

Abstract:Deploying Convolutional Neural Networks (CNNs) on edge platforms necessitates efficient hardware acceleration. Any unnecessary data movement in such accelerators can unacceptably degrade performance and efficiency. To address this, we develop a layer fusion technique targeting CNNs, that reduces off-chip data communication using a Genetic Algorithm (GA) applied to graph-based topological sort. Results show a 1.8$\times$ increase in energy efficiency and 1.9$\times$ improvement in energy-delay product (EDP) for MobileNet-v3 on a SIMBA-like mobile architecture. Our approach consistently improves workload performance, averaging 1.4$\times$ improvement to EDP for SIMBA and 1.12$\times$ for Eyeriss.

Via

Access Paper or Ask Questions

The Hardware Impact of Quantization and Pruning for Weights in Spiking Neural Networks

Feb 08, 2023

Clemens JS Schaefer, Pooria Taheri, Mark Horeni, Siddharth Joshi

Figure 1 for The Hardware Impact of Quantization and Pruning for Weights in Spiking Neural Networks

Figure 2 for The Hardware Impact of Quantization and Pruning for Weights in Spiking Neural Networks

Figure 3 for The Hardware Impact of Quantization and Pruning for Weights in Spiking Neural Networks

Figure 4 for The Hardware Impact of Quantization and Pruning for Weights in Spiking Neural Networks

Abstract:Energy efficient implementations and deployments of Spiking neural networks (SNNs) have been of great interest due to the possibility of developing artificial systems that can achieve the computational powers and energy efficiency of the biological brain. Efficient implementations of SNNs on modern digital hardware are also inspired by advances in machine learning and deep neural networks (DNNs). Two techniques widely employed in the efficient deployment of DNNs -- the quantization and pruning of parameters, can both compress the model size, reduce memory footprints, and facilitate low-latency execution. The interaction between quantization and pruning and how they might impact model performance on SNN accelerators is currently unknown. We study various combinations of pruning and quantization in isolation, cumulatively, and simultaneously (jointly) to a state-of-the-art SNN targeting gesture recognition for dynamic vision sensor cameras (DVS). We show that this state-of-the-art model is amenable to aggressive parameter quantization, not suffering from any loss in accuracy down to ternary weights. However, pruning only maintains iso-accuracy up to 80% sparsity, which results in 45% more energy than the best quantization on our architectural model. Applying both pruning and quantization can result in an accuracy loss to offer a favourable trade-off on the energy-accuracy Pareto-frontier for the given hardware configuration.

* Code https://github.com/Intelligent-Microsystems-Lab/SNNQuantPrune

Via

Access Paper or Ask Questions