Abstract:Spiking Neural Networks (SNNs) are promising bio-inspired third-generation neural networks. Recent research has trained deep SNN models with accuracy on par with Artificial Neural Networks (ANNs). Although the event-driven and sparse nature of SNNs show potential for more energy efficient computation than ANNs, SNN neurons have internal states which evolve over time. Keeping track of SNN states can significantly increase data movement and storage requirements, potentially losing its advantages with respect to ANNs. This paper investigates the energy effects of having neuron states, and how it is influenced by the chosen mapping to realistic hardware architectures with advanced memory hierarchies. Therefore, we develop STEMS, a mapping design space exploration tool for SNNs. STEMS models SNN's stateful behavior and explores intra-layer and inter-layer mapping optimizations to minimize data movement, considering both spatial and temporal SNN dimensions. Using STEMS, we show up to 12x reduction in off-chip data movement and 5x reduction in energy (on top of intra-layer optimizations), on two event-based vision SNN benchmarks. Finally, neuron states may not be needed for all SNN layers. By optimizing neuron states for one of our benchmarks, we show 20x reduction in neuron states and 1.4x better performance without accuracy loss.
Abstract:Synaptic delay parameterization of neural network models have remained largely unexplored but recent literature has been showing promising results, suggesting the delay parameterized models are simpler, smaller, sparser, and thus more energy efficient than similar performing (e.g. task accuracy) non-delay parameterized ones. We introduce Shared Circular Delay Queue (SCDQ), a novel hardware structure for supporting synaptic delays on digital neuromorphic accelerators. Our analysis and hardware results show that it scales better in terms of memory, than current commonly used approaches, and is more amortizable to algorithm-hardware co-optimizations, where in fact, memory scaling is modulated by model sparsity and not merely network size. Next to memory we also report performance on latency area and energy per inference.
Abstract:This work evaluates a forward-only learning algorithm on the MNIST dataset with hardware-in-the-loop training of a 4f optical correlator, achieving 87.6% accuracy with O(n2) complexity, compared to backpropagation, which achieves 88.8% accuracy with O(n2 log n) complexity.
Abstract:Recurrent Neural Networks (RNNs) are widely used for sequential processing but face fundamental limitations with continual inference due to state saturation, requiring disruptive hidden state resets. However, reset-based methods impose synchronization requirements with input boundaries and increase computational costs at inference. To address this, we propose an adaptive loss function that eliminates the need for resets during inference while preserving high accuracy over extended sequences. By combining cross-entropy and Kullback-Leibler divergence, the loss dynamically modulates the gradient based on input informativeness, allowing the network to differentiate meaningful data from noise and maintain stable representations over time. Experimental results demonstrate that our reset-free approach outperforms traditional reset-based methods when applied to a variety of RNNs, particularly in continual tasks, enhancing both the theoretical and practical capabilities of RNNs for streaming applications.
Abstract:This paper analyzes the design and competitiveness of four neural network (NN) architectures recently proposed as decoders for forward error correction (FEC) codes. We first consider the so-called single-label neural network (SLNN) and the multi-label neural network (MLNN) decoders which have been reported to achieve near maximum likelihood (ML) performance. Here, we show analytically that SLNN and MLNN decoders can always achieve ML performance, regardless of the code dimensions -- although at the cost of computational complexity -- and no training is in fact required. We then turn our attention to two transformer-based decoders: the error correction code transformer (ECCT) and the cross-attention message passing transformer (CrossMPT). We compare their performance against traditional decoders, and show that ordered statistics decoding outperforms these transformer-based decoders. The results in this paper cast serious doubts on the application of NN-based FEC decoders in the short and medium block length regime.
Abstract:While the human brain efficiently adapts to new tasks from a continuous stream of information, neural network models struggle to learn from sequential information without catastrophically forgetting previously learned tasks. This limitation presents a significant hurdle in deploying edge devices in real-world scenarios where information is presented in an inherently sequential manner. Active dendrites of pyramidal neurons play an important role in the brain ability to learn new tasks incrementally. By exploiting key properties of time-to-first-spike encoding and leveraging its high sparsity, we present a novel spiking neural network model enhanced with active dendrites. Our model can efficiently mitigate catastrophic forgetting in temporally-encoded SNNs, which we demonstrate with an end-of-training accuracy across tasks of 88.3% on the test set using the Split MNIST dataset. Furthermore, we provide a novel digital hardware architecture that paves the way for real-world deployment in edge devices. Using a Xilinx Zynq-7020 SoC FPGA, we demonstrate a 100-% match with our quantized software model, achieving an average inference time of 37.3 ms and an 80.0% accuracy.
Abstract:Configurable synaptic delays are a basic feature in many neuromorphic neural network hardware accelerators. However, they have been rarely used in model implementations, despite their promising impact on performance and efficiency in tasks that exhibit complex (temporal) dynamics, as it has been unclear how to optimize them. In this work, we propose a framework to train and deploy, in digital neuromorphic hardware, highly performing spiking neural network models (SNNs) where apart from the synaptic weights, the per-synapse delays are also co-optimized. Leveraging spike-based back-propagation-through-time, the training accounts for both platform constraints, such as synaptic weight precision and the total number of parameters per core, as a function of the network size. In addition, a delay pruning technique is used to reduce memory footprint with a low cost in performance. We evaluate trained models in two neuromorphic digital hardware platforms: Intel Loihi and Imec Seneca. Loihi offers synaptic delay support using the so-called Ring-Buffer hardware structure. Seneca does not provide native hardware support for synaptic delays. A second contribution of this paper is therefore a novel area- and memory-efficient hardware structure for acceleration of synaptic delays, which we have integrated in Seneca. The evaluated benchmark involves several models for solving the SHD (Spiking Heidelberg Digits) classification task, where minimal accuracy degradation during the transition from software to hardware is demonstrated. To our knowledge, this is the first work showcasing how to train and deploy hardware-aware models parameterized with synaptic delays, on multicore neuromorphic hardware accelerators.
Abstract:The field of neuromorphic computing holds great promise in terms of advancing computing efficiency and capabilities by following brain-inspired principles. However, the rich diversity of techniques employed in neuromorphic research has resulted in a lack of clear standards for benchmarking, hindering effective evaluation of the advantages and strengths of neuromorphic methods compared to traditional deep-learning-based methods. This paper presents a collaborative effort, bringing together members from academia and the industry, to define benchmarks for neuromorphic computing: NeuroBench. The goals of NeuroBench are to be a collaborative, fair, and representative benchmark suite developed by the community, for the community. In this paper, we discuss the challenges associated with benchmarking neuromorphic solutions, and outline the key features of NeuroBench. We believe that NeuroBench will be a significant step towards defining standards that can unify the goals of neuromorphic computing and drive its technological progress. Please visit neurobench.ai for the latest updates on the benchmark tasks and metrics.
Abstract:The event-driven and sparse nature of communication between spiking neurons in the brain holds great promise for flexible and energy-efficient AI. Recent advances in learning algorithms have demonstrated that recurrent networks of spiking neurons can be effectively trained to achieve competitive performance compared to standard recurrent neural networks. Still, as these learning algorithms use error-backpropagation through time (BPTT), they suffer from high memory requirements, are slow to train, and are incompatible with online learning. This limits the application of these learning algorithms to relatively small networks and to limited temporal sequence lengths. Online approximations to BPTT with lower computational and memory complexity have been proposed (e-prop, OSTL), but in practice also suffer from memory limitations and, as approximations, do not outperform standard BPTT training. Here, we show how a recently developed alternative to BPTT, Forward Propagation Through Time (FPTT) can be applied in spiking neural networks. Different from BPTT, FPTT attempts to minimize an ongoing dynamically regularized risk on the loss. As a result, FPTT can be computed in an online fashion and has fixed complexity with respect to the sequence length. When combined with a novel dynamic spiking neuron model, the Liquid-Time-Constant neuron, we show that SNNs trained with FPTT outperform online BPTT approximations, and approach or exceed offline BPTT accuracy on temporal classification tasks. This approach thus makes it feasible to train SNNs in a memory-friendly online fashion on long sequences and scale up SNNs to novel and complex neural architectures.
Abstract:As spiking-based deep learning inference applications are increasing in embedded systems, these systems tend to integrate neuromorphic accelerators such as $\mu$Brain to improve energy efficiency. We propose a $\mu$Brain-based scalable many-core neuromorphic hardware design to accelerate the computations of spiking deep convolutional neural networks (SDCNNs). To increase energy efficiency, cores are designed to be heterogeneous in terms of their neuron and synapse capacity (big cores have higher capacity than the little ones), and they are interconnected using a parallel segmented bus interconnect, which leads to lower latency and energy compared to a traditional mesh-based Network-on-Chip (NoC). We propose a system software framework called SentryOS to map SDCNN inference applications to the proposed design. SentryOS consists of a compiler and a run-time manager. The compiler compiles an SDCNN application into subnetworks by exploiting the internal architecture of big and little $\mu$Brain cores. The run-time manager schedules these sub-networks onto cores and pipeline their execution to improve throughput. We evaluate the proposed big little many-core neuromorphic design and the system software framework with five commonlyused SDCNN inference applications and show that the proposed solution reduces energy (between 37% and 98%), reduces latency (between 9% and 25%), and increases application throughput (between 20% and 36%). We also show that SentryOS can be easily extended for other spiking neuromorphic accelerators.