Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chetan Singh Thakur

Neural Signal Compression using RAMAN tinyML Accelerator for BCI Applications

Apr 09, 2025

Adithya Krishna, Sohan Debnath, André van Schaik, Mahesh Mehendale, Chetan Singh Thakur

Abstract:High-quality, multi-channel neural recording is indispensable for neuroscience research and clinical applications. Large-scale brain recordings often produce vast amounts of data that must be wirelessly transmitted for subsequent offline analysis and decoding, especially in brain-computer interfaces (BCIs) utilizing high-density intracortical recordings with hundreds or thousands of electrodes. However, transmitting raw neural data presents significant challenges due to limited communication bandwidth and resultant excessive heating. To address this challenge, we propose a neural signal compression scheme utilizing Convolutional Autoencoders (CAEs), which achieves a compression ratio of up to 150 for compressing local field potentials (LFPs). The CAE encoder section is implemented on RAMAN, an energy-efficient tinyML accelerator designed for edge computing, and subsequently deployed on an Efinix Ti60 FPGA with 37.3k LUTs and 8.6k register utilization. RAMAN leverages sparsity in activation and weights through zero skipping, gating, and weight compression techniques. Additionally, we employ hardware-software co-optimization by pruning CAE encoder model parameters using a hardware-aware balanced stochastic pruning strategy, resolving workload imbalance issues and eliminating indexing overhead to reduce parameter storage requirements by up to 32.4%. Using the proposed compact depthwise separable convolutional autoencoder (DS-CAE) model, the compressed neural data from RAMAN is reconstructed offline with superior signal-to-noise and distortion ratios (SNDR) of 22.6 dB and 27.4 dB, along with R2 scores of 0.81 and 0.94, respectively, evaluated on two monkey neural recordings.

Via

Access Paper or Ask Questions

Neuromorphic Cameras in Astronomy: Unveiling the Future of Celestial Imaging Beyond Conventional Limits

Mar 20, 2025

Satyapreet Singh Yadav, Bikram Pradhan, Kenil Rajendrabhai Ajudiya, T. S. Kumar, Nirupam Roy, Andre Van Schaik, Chetan Singh Thakur

Abstract:To deepen our understanding of optical astronomy, we must advance imaging technology to overcome conventional frame-based cameras' limited dynamic range and temporal resolution. Our Perspective paper examines how neuromorphic cameras can effectively address these challenges. Drawing inspiration from the human retina, neuromorphic cameras excel in speed and high dynamic range by utilizing asynchronous pixel operation and logarithmic photocurrent conversion, making them highly effective for celestial imaging. We use 1300 mm terrestrial telescope to demonstrate the neuromorphic camera's ability to simultaneously capture faint and bright celestial sources while preventing saturation effects. We illustrate its photometric capabilities through aperture photometry of a star field with faint stars. Detection of the faint gas cloud structure of the Trapezium cluster during a full moon night highlights the camera's high dynamic range, effectively mitigating static glare from lunar illumination. Our investigations also include detecting meteorite passing near the Moon and Earth, as well as imaging satellites and anthropogenic debris with exceptionally high temporal resolution using a 200mm telescope. Our observations show the immense potential of neuromorphic cameras in advancing astronomical optical imaging and pushing the boundaries of observational astronomy.

* Optical astronomy, Neuromorphic camera, Photometry, Event-based, Asynchronous, High dynamic range, High temporal resolution, Meteorite imaging

Via

Access Paper or Ask Questions

Neuromorphic Retina: An FPGA-based Emulator

Jan 15, 2025

Prince Phillip, Pallab Kumar Nath, Kapil Jainwal, Andre van Schaik, Chetan Singh Thakur

Figure 1 for Neuromorphic Retina: An FPGA-based Emulator

Figure 2 for Neuromorphic Retina: An FPGA-based Emulator

Figure 3 for Neuromorphic Retina: An FPGA-based Emulator

Figure 4 for Neuromorphic Retina: An FPGA-based Emulator

Abstract:Implementing accurate models of the retina is a challenging task, particularly in the context of creating visual prosthetics and devices. Notwithstanding the presence of diverse artificial renditions of the retina, the imperative task persists to pursue a more realistic model. In this work, we are emulating a neuromorphic retina model on an FPGA. The key feature of this model is its powerful adaptation to luminance and contrast, which allows it to accurately emulate the sensitivity of the biological retina to changes in light levels. Phasic and tonic cells are realizable in the retina in the simplest way possible. Our FPGA implementation of the proposed biologically inspired digital retina, incorporating a receptive field with a center-surround structure, is reconfigurable and can support 128*128 pixel images at a frame rate of 200fps. It consumes 1720 slices, approximately 3.7k Look-Up Tables (LUTs), and Flip-Flops (FFs) on the FPGA. This implementation provides a high-performance, low-power, and small-area solution and could be a significant step forward in the development of biologically plausible retinal prostheses with enhanced information processing capabilities

Via

Access Paper or Ask Questions

KALAM: toolKit for Automating high-Level synthesis of Analog computing systeMs

Oct 30, 2024

Ankita Nandi, Krishil Gandhi, Mahendra Pratap Singh, Shantanu Chakrabartty, Chetan Singh Thakur

Figure 1 for KALAM: toolKit for Automating high-Level synthesis of Analog computing systeMs

Figure 2 for KALAM: toolKit for Automating high-Level synthesis of Analog computing systeMs

Figure 3 for KALAM: toolKit for Automating high-Level synthesis of Analog computing systeMs

Figure 4 for KALAM: toolKit for Automating high-Level synthesis of Analog computing systeMs

Abstract:Diverse computing paradigms have emerged to meet the growing needs for intelligent energy-efficient systems. The Margin Propagation (MP) framework, being one such initiative in the analog computing domain, stands out due to its scalability across biasing conditions, temperatures, and diminishing process technology nodes. However, the lack of digital-like automation tools for designing analog systems (including that of MP analog) hinders their adoption for designing large systems. The inherent scalability and modularity of MP systems present a unique opportunity in this regard. This paper introduces KALAM (toolKit for Automating high-Level synthesis of Analog computing systeMs), which leverages factor graphs as the foundational paradigm for synthesizing MP-based analog computing systems. Factor graphs are the basis of various signal processing tasks and, when coupled with MP, can be used to design scalable and energy-efficient analog signal processors. Using Python scripting language, the KALAM automation flow translates an input factor graph to its equivalent SPICE-compatible circuit netlist that can be used to validate the intended functionality. KALAM also allows the integration of design optimization strategies such as precision tuning, variable elimination, and mathematical simplification. We demonstrate KALAM's versatility for tasks such as Bayesian inference, Low-Density Parity Check (LDPC) decoding, and Artificial Neural Networks (ANN). Simulation results of the netlists align closely with software implementations, affirming the efficacy of our proposed automation tool.

* 5 Pages, 4 figures

Via

Access Paper or Ask Questions

Low-latency machine learning FPGA accelerator for multi-qubit state discrimination

Jul 04, 2024

Pradeep Kumar Gautam, Shantharam Kalipatnapu, Shankaranarayanan H, Ujjawal Singhal, Benjamin Lienhard, Vibhor Singh, Chetan Singh Thakur

Abstract:Measuring a qubit is a fundamental yet error prone operation in quantum computing. These errors can stem from various sources such as crosstalk, spontaneous state-transitions, and excitation caused by the readout pulse. In this work, we utilize an integrated approach to deploy neural networks (NN) on to field programmable gate arrays (FPGA). We demonstrate that it is practical to design and implement a fully connected neural network accelerator for frequency-multiplexed readout balancing computational complexity with low latency requirements without significant loss in accuracy. The neural network is implemented by quantization of weights, activation functions, and inputs. The hardware accelerator performs frequency-multiplexed readout of 5 superconducting qubits in less than 50 ns on RFSoC ZCU111 FPGA which is first of its kind in the literature. These modules can be implemented and integrated in existing Quantum control and readout platforms using a RFSoC ZCU111 ready for experimental deployment.

* 10 pages, 6 figures

Via

Access Paper or Ask Questions

Margin Propagation based XOR-SAT Solvers for Decoding of LDPC Codes

Feb 07, 2024

Ankita Nandi, Shantanu Chakrabartty, Chetan Singh Thakur

Abstract:Decoding of Low-Density Parity Check (LDPC) codes can be viewed as a special case of XOR-SAT problems, for which low-computational complexity bit-flipping algorithms have been proposed in the literature. However, a performance gap exists between the bit-flipping LDPC decoding algorithms and the benchmark LDPC decoding algorithms, such as the Sum-Product Algorithm (SPA). In this paper, we propose an XOR-SAT solver using log-sum-exponential functions and demonstrate its advantages for LDPC decoding. This is then approximated using the Margin Propagation formulation to attain a low-complexity LDPC decoder. The proposed algorithm uses soft information to decide the bit-flips that maximize the number of parity check constraints satisfied over an optimization function. The proposed solver can achieve results that are within $0.1$dB of the Sum-Product Algorithm for the same number of code iterations. It is also at least 10x lesser than other Gradient-Descent Bit Flipping decoding algorithms, which are also bit-flipping algorithms based on optimization functions. The approximation using the Margin Propagation formulation does not require any multipliers, resulting in significantly lower computational complexity than other soft-decision Bit-Flipping LDPC decoders.

* 12 pages, 7 figures, Paper submitted to IEEE Transactions on Communications

Via

Access Paper or Ask Questions

RAMAN: A Re-configurable and Sparse tinyML Accelerator for Inference on Edge

Jun 10, 2023

Adithya Krishna, Srikanth Rohit Nudurupati, Chandana D G, Pritesh Dwivedi, André van Schaik, Mahesh Mehendale, Chetan Singh Thakur

Figure 1 for RAMAN: A Re-configurable and Sparse tinyML Accelerator for Inference on Edge

Figure 2 for RAMAN: A Re-configurable and Sparse tinyML Accelerator for Inference on Edge

Figure 3 for RAMAN: A Re-configurable and Sparse tinyML Accelerator for Inference on Edge

Figure 4 for RAMAN: A Re-configurable and Sparse tinyML Accelerator for Inference on Edge

Abstract:Deep Neural Network (DNN) based inference at the edge is challenging as these compute and data-intensive algorithms need to be implemented at low cost and low power while meeting the latency constraints of the target applications. Sparsity, in both activations and weights inherent to DNNs, is a key knob to leverage. In this paper, we present RAMAN, a Re-configurable and spArse tinyML Accelerator for infereNce on edge, architected to exploit the sparsity to reduce area (storage), power as well as latency. RAMAN can be configured to support a wide range of DNN topologies - consisting of different convolution layer types and a range of layer parameters (feature-map size and the number of channels). RAMAN can also be configured to support accuracy vs power/latency tradeoffs using techniques deployed at compile-time and run-time. We present the salient features of the architecture, provide implementation results and compare the same with the state-of-the-art. RAMAN employs novel dataflow inspired by Gustavson's algorithm that has optimal input activation (IA) and output activation (OA) reuse to minimize memory access and the overall data movement cost. The dataflow allows RAMAN to locally reduce the partial sum (Psum) within a processing element array to eliminate the Psum writeback traffic. Additionally, we suggest a method to reduce peak activation memory by overlapping IA and OA on the same memory space, which can reduce storage requirements by up to 50%. RAMAN was implemented on a low-power and resource-constrained Efinix Ti60 FPGA with 37.2K LUTs and 8.6K register utilization. RAMAN processes all layers of the MobileNetV1 model at 98.47 GOp/s/W and the DS-CNN model at 79.68 GOp/s/W by leveraging both weight and activation sparsity.

Via

Access Paper or Ask Questions

Neuromorphic Computing with AER using Time-to-Event-Margin Propagation

Apr 27, 2023

Madhuvanthi Srivatsav R, Shantanu Chakrabartty, Chetan Singh Thakur

Abstract:Address-Event-Representation (AER) is a spike-routing protocol that allows the scaling of neuromorphic and spiking neural network (SNN) architectures to a size that is comparable to that of digital neural network architectures. However, in conventional neuromorphic architectures, the AER protocol and, in general, any virtual interconnect plays only a passive role in computation, i.e., only for routing spikes and events. In this paper, we show how causal temporal primitives like delay, triggering, and sorting inherent in the AER protocol itself can be exploited for scalable neuromorphic computing using our proposed technique called Time-to-Event Margin Propagation (TEMP). The proposed TEMP-based AER architecture is fully asynchronous and relies on interconnect delays for memory and computing as opposed to conventional and local multiply-and-accumulate (MAC) operations. We show that the time-based encoding in the TEMP neural network produces a spatio-temporal representation that can encode a large number of discriminatory patterns. As a proof-of-concept, we show that a trained TEMP-based convolutional neural network (CNN) can demonstrate an accuracy greater than 99% on the MNIST dataset. Overall, our work is a biologically inspired computing paradigm that brings forth a new dimension of research to the field of neuromorphic computing.

Via

Access Paper or Ask Questions

Multiplierless In-filter Computing for tinyML Platforms

Apr 24, 2023

Abhishek Ramdas Nair, Pallab Kumar Nath, Shantanu Chakrabartty, Chetan Singh Thakur

Figure 1 for Multiplierless In-filter Computing for tinyML Platforms

Figure 2 for Multiplierless In-filter Computing for tinyML Platforms

Figure 3 for Multiplierless In-filter Computing for tinyML Platforms

Figure 4 for Multiplierless In-filter Computing for tinyML Platforms

Abstract:Wildlife conservation using continuous monitoring of environmental factors and biomedical classification, which generate a vast amount of sensor data, is a challenge due to limited bandwidth in the case of remote monitoring. It becomes critical to have classification where data is generated, and only classified data is used for monitoring. We present a novel multiplierless framework for in-filter acoustic classification using Margin Propagation (MP) approximation used in low-power edge devices deployable in remote areas with limited connectivity. The entire design of this classification framework is based on template-based kernel machine, which include feature extraction and inference, and uses basic primitives like addition/subtraction, shift, and comparator operations, for hardware implementation. Unlike full precision training methods for traditional classification, we use MP-based approximation for training, including backpropagation mitigating approximation errors. The proposed framework is general enough for acoustic classification. However, we demonstrate the hardware friendliness of this framework by implementing a parallel Finite Impulse Response (FIR) filter bank in a kernel machine classifier optimized for a Field Programmable Gate Array (FPGA). The FIR filter acts as the feature extractor and non-linear kernel for the kernel machine implemented using MP approximation and a downsampling method to reduce the order of the filters. The FPGA implementation on Spartan 7 shows that the MP-approximated in-filter kernel machine is more efficient than traditional classification frameworks with just less than 1K slices.

Via

Access Paper or Ask Questions

Theroretical Insight into Batch Normalization: Data Dependant Auto-Tuning of Regularization Rate

Sep 15, 2022

Lakshmi Annamalai, Chetan Singh Thakur

Figure 1 for Theroretical Insight into Batch Normalization: Data Dependant Auto-Tuning of Regularization Rate

Figure 2 for Theroretical Insight into Batch Normalization: Data Dependant Auto-Tuning of Regularization Rate

Figure 3 for Theroretical Insight into Batch Normalization: Data Dependant Auto-Tuning of Regularization Rate

Figure 4 for Theroretical Insight into Batch Normalization: Data Dependant Auto-Tuning of Regularization Rate

Abstract:Batch normalization is widely used in deep learning to normalize intermediate activations. Deep networks suffer from notoriously increased training complexity, mandating careful initialization of weights, requiring lower learning rates, etc. These issues have been addressed by Batch Normalization (\textbf{BN}), by normalizing the inputs of activations to zero mean and unit standard deviation. Making this batch normalization part of the training process dramatically accelerates the training process of very deep networks. A new field of research has been going on to examine the exact theoretical explanation behind the success of \textbf{BN}. Most of these theoretical insights attempt to explain the benefits of \textbf{BN} by placing them on its influence on optimization, weight scale invariance, and regularization. Despite \textbf{BN} undeniable success in accelerating generalization, the gap of analytically relating the effect of \textbf{BN} to the regularization parameter is still missing. This paper aims to bring out the data-dependent auto-tuning of the regularization parameter by \textbf{BN} with analytical proofs. We have posed \textbf{BN} as a constrained optimization imposed on non-\textbf{BN} weights through which we demonstrate its data statistics dependant auto-tuning of regularization parameter. We have also given analytical proof for its behavior under a noisy input scenario, which reveals the signal vs. noise tuning of the regularization parameter. We have also substantiated our claim with empirical results from the MNIST dataset experiments.

Via

Access Paper or Ask Questions