Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Edward Hanson

Biologically Plausible Learning on Neuromorphic Hardware Architectures

Dec 29, 2022

Christopher Wolters, Brady Taylor, Edward Hanson, Xiaoxuan Yang, Ulf Schlichtmann, Yiran Chen

Figure 1 for Biologically Plausible Learning on Neuromorphic Hardware Architectures

Figure 2 for Biologically Plausible Learning on Neuromorphic Hardware Architectures

Figure 3 for Biologically Plausible Learning on Neuromorphic Hardware Architectures

Figure 4 for Biologically Plausible Learning on Neuromorphic Hardware Architectures

Abstract:With an ever-growing number of parameters defining increasingly complex networks, Deep Learning has led to several breakthroughs surpassing human performance. As a result, data movement for these millions of model parameters causes a growing imbalance known as the memory wall. Neuromorphic computing is an emerging paradigm that confronts this imbalance by performing computations directly in analog memories. On the software side, the sequential Backpropagation algorithm prevents efficient parallelization and thus fast convergence. A novel method, Direct Feedback Alignment, resolves inherent layer dependencies by directly passing the error from the output to each layer. At the intersection of hardware/software co-design, there is a demand for developing algorithms that are tolerable to hardware nonidealities. Therefore, this work explores the interrelationship of implementing bio-plausible learning in-situ on neuromorphic hardware, emphasizing energy, area, and latency constraints. Using the benchmarking framework DNN+NeuroSim, we investigate the impact of hardware nonidealities and quantization on algorithm performance, as well as how network topologies and algorithm-level design choices can scale latency, energy and area consumption of a chip. To the best of our knowledge, this work is the first to compare the impact of different learning algorithms on Compute-In-Memory-based hardware and vice versa. The best results achieved for accuracy remain Backpropagation-based, notably when facing hardware imperfections. Direct Feedback Alignment, on the other hand, allows for significant speedup due to parallelization, reducing training time by a factor approaching N for N-layered networks.

Via

Access Paper or Ask Questions

PENNI: Pruned Kernel Sharing for Efficient CNN Inference

May 14, 2020

Shiyu Li, Edward Hanson, Hai Li, Yiran Chen

Figure 1 for PENNI: Pruned Kernel Sharing for Efficient CNN Inference

Figure 2 for PENNI: Pruned Kernel Sharing for Efficient CNN Inference

Figure 3 for PENNI: Pruned Kernel Sharing for Efficient CNN Inference

Figure 4 for PENNI: Pruned Kernel Sharing for Efficient CNN Inference

Abstract:Although state-of-the-art (SOTA) CNNs achieve outstanding performance on various tasks, their high computation demand and massive number of parameters make it difficult to deploy these SOTA CNNs onto resource-constrained devices. Previous works on CNN acceleration utilize low-rank approximation of the original convolution layers to reduce computation cost. However, these methods are very difficult to conduct upon sparse models, which limits execution speedup since redundancies within the CNN model are not fully exploited. We argue that kernel granularity decomposition can be conducted with low-rank assumption while exploiting the redundancy within the remaining compact coefficients. Based on this observation, we propose PENNI, a CNN model compression framework that is able to achieve model compactness and hardware efficiency simultaneously by (1) implementing kernel sharing in convolution layers via a small number of basis kernels and (2) alternately adjusting bases and coefficients with sparse constraints. Experiments show that we can prune 97% parameters and 92% FLOPs on ResNet18 CIFAR10 with no accuracy loss, and achieve 44% reduction in run-time memory consumption and a 53% reduction in inference latency.

Via

Access Paper or Ask Questions