Abstract:The performance of deep learning algorithms such as neural networks (NNs) has increased tremendously recently, and they can achieve state-of-the-art performance in many domains. However, due to memory and computation resource constraints, implementing NNs on edge devices is a challenging task. Therefore, hardware accelerators such as computation-in-memory (CIM) with memristive devices have been developed to accelerate the most common operations, i.e., matrix-vector multiplication. However, due to inherent device properties, external environmental factors such as temperature, and an immature fabrication process, memristors suffer from various non-idealities, including defects and variations occurring during manufacturing and runtime. Consequently, there is a lack of complete confidence in the predictions made by the model. To improve confidence in NN predictions made by hardware accelerators in the presence of device non-idealities, in this paper, we propose a Bayesian test vector generation framework that can estimate the model uncertainty of NNs implemented on memristor-based CIM hardware. Compared to the conventional point estimate test vector generation method, our method is more generalizable across different model dimensions and requires storing only one test Bayesian vector in the hardware. Our method is evaluated on different model dimensions, tasks, fault rates, and variation noise to show that it can consistently achieve $100\%$ coverage with only $0.024$ MB of memory overhead.
Abstract:The applications of artificial intelligence (AI) are rapidly evolving, and they are also commonly used in safety-critical domains, such as autonomous driving and medical diagnosis, where functional safety is paramount. In AI-driven systems, uncertainty estimation allows the user to avoid overconfidence predictions and achieve functional safety. Therefore, the robustness and reliability of model predictions can be improved. However, conventional uncertainty estimation methods, such as the deep ensemble method, impose high computation and, accordingly, hardware (latency and energy) overhead because they require the storage and processing of multiple models. Alternatively, Monte Carlo dropout (MC-dropout) methods, although having low memory overhead, necessitate numerous ($\sim 100$) forward passes, leading to high computational overhead and latency. Thus, these approaches are not suitable for battery-powered edge devices with limited computing and memory resources. In this paper, we propose the Tiny-Deep Ensemble approach, a low-cost approach for uncertainty estimation on edge devices. In our approach, only normalization layers are ensembled $M$ times, with all ensemble members sharing common weights and biases, leading to a significant decrease in storage requirements and latency. Moreover, our approach requires only one forward pass in a hardware architecture that allows batch processing for inference and uncertainty estimation. Furthermore, it has approximately the same memory overhead compared to a single model. Therefore, latency and memory overhead are reduced by a factor of up to $\sim M\times$. Nevertheless, our method does not compromise accuracy, with an increase in inference accuracy of up to $\sim 1\%$ and a reduction in RMSE of $17.17\%$ in various benchmark datasets, tasks, and state-of-the-art architectures.
Abstract:Bayesian Neural Networks (BayNNs) naturally provide uncertainty in their predictions, making them a suitable choice in safety-critical applications. Additionally, their realization using memristor-based in-memory computing (IMC) architectures enables them for resource-constrained edge applications. In addition to predictive uncertainty, however, the ability to be inherently robust to noise in computation is also essential to ensure functional safety. In particular, memristor-based IMCs are susceptible to various sources of non-idealities such as manufacturing and runtime variations, drift, and failure, which can significantly reduce inference accuracy. In this paper, we propose a method to inherently enhance the robustness and inference accuracy of BayNNs deployed in IMC architectures. To achieve this, we introduce a novel normalization layer combined with stochastic affine transformations. Empirical results in various benchmark datasets show a graceful degradation in inference accuracy, with an improvement of up to $58.11\%$.
Abstract:Neural networks (NNs) can achieved high performance in various fields such as computer vision, and natural language processing. However, deploying NNs in resource-constrained safety-critical systems has challenges due to uncertainty in the prediction caused by out-of-distribution data, and hardware non-idealities. To address the challenges of deploying NNs in resource-constrained safety-critical systems, this paper summarizes the (4th year) PhD thesis work that explores scalable and efficient methods for uncertainty estimation and reduction in deep learning, with a focus on Computation-in-Memory (CIM) using emerging resistive non-volatile memories. We tackle the inherent uncertainties arising from out-of-distribution inputs and hardware non-idealities, crucial in maintaining functional safety in automated decision-making systems. Our approach encompasses problem-aware training algorithms, novel NN topologies, and hardware co-design solutions, including dropout-based \emph{binary} Bayesian Neural Networks leveraging spintronic devices and variational inference techniques. These innovations significantly enhance OOD data detection, inference accuracy, and energy efficiency, thereby contributing to the reliability and robustness of NN implementations.
Abstract:Internet of Things (IoT) and smart wearable devices for personalized healthcare will require storing and computing ever-increasing amounts of data. The key requirements for these devices are ultra-low-power, high-processing capabilities, autonomy at low cost, as well as reliability and accuracy to enable Green AI at the edge. Artificial Intelligence (AI) models, especially Bayesian Neural Networks (BayNNs) are resource-intensive and face challenges with traditional computing architectures due to the memory wall problem. Computing-in-Memory (CIM) with emerging resistive memories offers a solution by combining memory blocks and computing units for higher efficiency and lower power consumption. However, implementing BayNNs on CIM hardware, particularly with spintronic technologies, presents technical challenges due to variability and manufacturing defects. The NeuSPIN project aims to address these challenges through full-stack hardware and software co-design, developing novel algorithmic and circuit design approaches to enhance the performance, energy-efficiency and robustness of BayNNs on sprintronic-based CIM platforms.
Abstract:Bayesian Neural Networks (BayNNs) can inherently estimate predictive uncertainty, facilitating informed decision-making. Dropout-based BayNNs are increasingly implemented in spintronics-based computation-in-memory architectures for resource-constrained yet high-performance safety-critical applications. Although uncertainty estimation is important, the reliability of Dropout generation and BayNN computation is equally important for target applications but is overlooked in existing works. However, testing BayNNs is significantly more challenging compared to conventional NNs, due to their stochastic nature. In this paper, we present for the first time the model of the non-idealities of the spintronics-based Dropout module and analyze their impact on uncertainty estimates and accuracy. Furthermore, we propose a testing framework based on repeatability ranking for Dropout-based BayNN with up to $100\%$ fault coverage while using only $0.2\%$ of training data as test vectors.
Abstract:Neural networks (NNs) are increasingly used in always-on safety-critical applications deployed on hardware accelerators (NN-HAs) employing various memory technologies. Reliable continuous operation of NN is essential for safety-critical applications. During online operation, NNs are susceptible to single and multiple permanent and soft errors due to factors such as radiation, aging, and thermal effects. Explicit NN-HA testing methods cannot detect transient faults during inference, are unsuitable for always-on applications, and require extensive test vector generation and storage. Therefore, in this paper, we propose the \emph{uncertainty fingerprint} approach representing the online fault status of NN. Furthermore, we propose a dual head NN topology specifically designed to produce uncertainty fingerprints and the primary prediction of the NN in \emph{a single shot}. During the online operation, by matching the uncertainty fingerprint, we can concurrently self-test NNs with up to $100\%$ coverage with a low false positive rate while maintaining a similar performance of the primary task. Compared to existing works, memory overhead is reduced by up to $243.7$ MB, multiply and accumulate (MAC) operation is reduced by up to $10000\times$, and false-positive rates are reduced by up to $89\%$.
Abstract:Uncertainty estimation in Neural Networks (NNs) is vital in improving reliability and confidence in predictions, particularly in safety-critical applications. Bayesian Neural Networks (BayNNs) with Dropout as an approximation offer a systematic approach to quantifying uncertainty, but they inherently suffer from high hardware overhead in terms of power, memory, and computation. Thus, the applicability of BayNNs to edge devices with limited resources or to high-performance applications is challenging. Some of the inherent costs of BayNNs can be reduced by accelerating them in hardware on a Computation-In-Memory (CIM) architecture with spintronic memories and binarizing their parameters. However, numerous stochastic units are required to implement conventional dropout-based BayNN. In this paper, we propose the Scale Dropout, a novel regularization technique for Binary Neural Networks (BNNs), and Monte Carlo-Scale Dropout (MC-Scale Dropout)-based BayNNs for efficient uncertainty estimation. Our approach requires only one stochastic unit for the entire model, irrespective of the model size, leading to a highly scalable Bayesian NN. Furthermore, we introduce a novel Spintronic memory-based CIM architecture for the proposed BayNN that achieves more than $100\times$ energy savings compared to the state-of-the-art. We validated our method to show up to a $1\%$ improvement in predictive performance and superior uncertainty estimates compared to related works.
Abstract:Recently, machine learning systems have gained prominence in real-time, critical decision-making domains, such as autonomous driving and industrial automation. Their implementations should avoid overconfident predictions through uncertainty estimation. Bayesian Neural Networks (BayNNs) are principled methods for estimating predictive uncertainty. However, their computational costs and power consumption hinder their widespread deployment in edge AI. Utilizing Dropout as an approximation of the posterior distribution, binarizing the parameters of BayNNs, and further to that implementing them in spintronics-based computation-in-memory (CiM) hardware arrays provide can be a viable solution. However, designing hardware Dropout modules for convolutional neural network (CNN) topologies is challenging and expensive, as they may require numerous Dropout modules and need to use spatial information to drop certain elements. In this paper, we introduce MC-SpatialDropout, a spatial dropout-based approximate BayNNs with spintronics emerging devices. Our method utilizes the inherent stochasticity of spintronic devices for efficient implementation of the spatial dropout module compared to existing implementations. Furthermore, the number of dropout modules per network layer is reduced by a factor of $9\times$ and energy consumption by a factor of $94.11\times$, while still achieving comparable predictive performance and uncertainty estimates compared to related works.
Abstract:Neural networks (NNs) are capable of learning complex patterns and relationships in data to make predictions with high accuracy, making them useful for various tasks. However, NNs are both computation-intensive and memory-intensive methods, making them challenging for edge applications. To accelerate the most common operations (matrix-vector multiplication) in NNs, hardware accelerator architectures such as computation-in-memory (CiM) with non-volatile memristive crossbars are utilized. Although they offer benefits such as power efficiency, parallelism, and nonvolatility, they suffer from various faults and variations, both during manufacturing and lifetime operations. This can lead to faulty computations and, in turn, degradation of post-mapping inference accuracy, which is unacceptable for many applications, including safety-critical applications. Therefore, proper testing of NN hardware accelerators is required. In this paper, we propose a \emph{one-shot} testing approach that can test NNs accelerated on memristive crossbars with only one test vector, making it very suitable for online testing applications. Our approach can consistently achieve $100\%$ fault coverage across several large topologies with up to $201$ layers and challenging tasks like semantic segmentation. Nevertheless, compared to existing methods, the fault coverage is improved by up to $24\%$, the memory overhead is only $0.0123$ MB, a reduction of up to $19980\times$ and the number of test vectors is reduced by $10000\times$.