Abstract:Lifelong learning - an agent's ability to learn throughout its lifetime - is a hallmark of biological learning systems and a central challenge for artificial intelligence (AI). The development of lifelong learning algorithms could lead to a range of novel AI applications, but this will also require the development of appropriate hardware accelerators, particularly if the models are to be deployed on edge platforms, which have strict size, weight, and power constraints. Here, we explore the design of lifelong learning AI accelerators that are intended for deployment in untethered environments. We identify key desirable capabilities for lifelong learning accelerators and highlight metrics to evaluate such accelerators. We then discuss current edge AI accelerators and explore the future design of lifelong learning accelerators, considering the role that different emerging technologies could play.
Abstract:The ability to learn continuously from an incoming data stream without catastrophic forgetting is critical to designing intelligent systems. Many approaches to continual learning rely on stochastic gradient descent and its variants that employ global error updates, and hence need to adopt strategies such as memory buffers or replay to circumvent its stability, greed, and short-term memory limitations. To address this limitation, we have developed a biologically inspired lightweight neural network architecture that incorporates synaptic plasticity mechanisms and neuromodulation and hence learns through local error signals to enable online continual learning without stochastic gradient descent. Our approach leads to superior online continual learning performance on Split-MNIST, Split-CIFAR-10, and Split-CIFAR-100 datasets compared to other memory-constrained learning approaches and matches that of the state-of-the-art memory-intensive replay-based approaches. We further demonstrate the effectiveness of our approach by integrating key design concepts into other backpropagation-based continual learning algorithms, significantly improving their accuracy. Our results provide compelling evidence for the importance of incorporating biological principles into machine learning models and offer insights into how we can leverage them to design more efficient and robust systems for online continual learning.
Abstract:In this work we have extended AutoML inspired approaches to the exploration and optimization of neuromorphic architectures. Through the integration of a parallel asynchronous model-based search approach with a simulation framework to simulate spiking architectures, we are able to efficiently explore the configuration space of neuromorphic architectures and identify the subset of conditions leading to the highest performance in a targeted application. We have demonstrated this approach on an exemplar case of real time, on-chip learning application. Our results indicate that we can effectively use optimization approaches to optimize complex architectures, therefore providing a viable pathway towards application-driven codesign.
Abstract:Despite the advancement of machine learning techniques in recent years, state-of-the-art systems lack robustness to "real world" events, where the input distributions and tasks encountered by the deployed systems will not be limited to the original training context, and systems will instead need to adapt to novel distributions and tasks while deployed. This critical gap may be addressed through the development of "Lifelong Learning" systems that are capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3) Scalability. Unfortunately, efforts to improve these capabilities are typically treated as distinct areas of research that are assessed independently, without regard to the impact of each separate capability on other aspects of the system. We instead propose a holistic approach, using a suite of metrics and an evaluation framework to assess Lifelong Learning in a principled way that is agnostic to specific domains or system techniques. Through five case studies, we show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems. We highlight how the proposed suite of metrics quantifies performance trade-offs present during Lifelong Learning system development - both the widely discussed Stability-Plasticity dilemma and the newly proposed relationship between Sample Efficient and Robust Learning. Further, we make recommendations for the formulation and use of metrics to guide the continuing development of Lifelong Learning systems and assess their progress in the future.
Abstract:We have developed a model for online continual or lifelong reinforcement learning (RL) inspired on the insect brain. Our model leverages the offline training of a feature extraction and a common general policy layer to enable the convergence of RL algorithms in online settings. Sharing a common policy layer across tasks leads to positive backward transfer, where the agent continuously improved in older tasks sharing the same underlying general policy. Biologically inspired restrictions to the agent's network are key for the convergence of RL algorithms. This provides a pathway towards efficient online RL in resource-constrained scenarios.
Abstract:In this work we explore the application of deep neural networks to the optimization of atomic layer deposition processes based on thickness values obtained at different points of an ALD reactor. We introduce a dataset designed to train neural networks to predict saturation times based on the dose time and thickness values measured at different points of the reactor for a single experimental condition. We then explore different artificial neural network configurations, including depth (number of hidden layers) and size (number of neurons in each layers) to better understand the size and complexity that neural networks should have to achieve high predictive accuracy. The results obtained show that trained neural networks can accurately predict saturation times without requiring any prior information on the surface kinetics. This provides a viable approach to minimize the number of experiments required to optimize new ALD processes in a known reactor. However, the datasets and training procedure depend on the reactor geometry.
Abstract:Neuromorphic architectures are ideally suited for the implementation of smart sensors able to react, learn, and respond to a changing environment. Our work uses the insect brain as a model to understand how heterogeneous architectures, incorporating different types of neurons and encodings, can be leveraged to create systems integrating input processing, evaluation, and response. Here we show how the combination of time and rate encodings can lead to fast sensors that are able to generate a hypothesis on the input in only a few cycles and then use that hypothesis as secondary input for more detailed analysis.
Abstract:We focus on the problem of how to achieve online continual learning under memory-constrained conditions where the input data may not be known a priori. These constraints are relevant in edge computing scenarios. We have developed an architecture where input processing over data streams and online learning are integrated in a single recurrent network architecture. This allows us to cast metalearning optimization as a mixed-integer optimization problem, where different synaptic plasticity algorithms and feature extraction layers can be swapped out and their hyperparameters are optimized to identify optimal architectures for different sets of tasks. We utilize a Bayesian optimization method to search over a design space that spans multiple learning algorithms, their specific hyperparameters, and feature extraction layers. We demonstrate our approach for online non-incremental and class-incremental learning tasks. Our optimization algorithm finds configurations that achieve superior continual learning performance on Split-MNIST and Permuted-MNIST data as compared with other memory-constrained learning approaches, and it matches that of the state-of-the-art memory replay-based approaches without explicit data storage and replay. Our approach allows us to explore the transferability of optimal learning conditions to tasks and datasets that have not been previously seen. We demonstrate that the accuracy of our transfer metalearning across datasets can be largely explained through a transfer coefficient that can be based on metrics of dimensionality and distance between datasets.
Abstract:In this work we explore recurrent representations of leaky integrate and fire neurons operating at a timescale equal to their absolute refractory period. Our coarse time scale approximation is obtained using a probability distribution function for spike arrivals that is homogeneously distributed over this time interval. This leads to a discrete representation that exhibits the same dynamics as the continuous model, enabling efficient large scale simulations and backpropagation through the recurrent implementation. We use this approach to explore the training of deep spiking neural networks including convolutional, all-to-all connectivity, and maxpool layers directly in Pytorch. We found that the recurrent model leads to high classification accuracy using just 4-long spike trains during training. We also observed a good transfer back to continuous implementations of leaky integrate and fire neurons. Finally, we applied this approach to some of the standard control problems as a first step to explore reinforcement learning using neuromorphic chips.
Abstract:The ability to learn and adapt in real time is a central feature of biological systems. Neuromorphic architectures demonstrating such versatility can greatly enhance our ability to efficiently process information at the edge. A key challenge, however, is to understand which learning rules are best suited for specific tasks and how the relevant hyperparameters can be fine-tuned. In this work, we introduce a conceptual framework in which the learning process is integrated into the network itself. This allows us to cast meta-learning as a mathematical optimization problem. We employ DeepHyper, a scalable, asynchronous model-based search, to simultaneously optimize the choice of meta-learning rules and their hyperparameters. We demonstrate our approach with two different datasets, MNIST and FashionMNIST, using a network architecture inspired by the learning center of the insect brain. Our results show that optimal learning rules can be dataset-dependent even within similar tasks. This dependency demonstrates the importance of introducing versatility and flexibility in the learning algorithms. It also illuminates experimental findings in insect neuroscience that have shown a heterogeneity of learning rules within the insect mushroom body.