Abstract:A new trans-disciplinary knowledge area, Edge Artificial Intelligence or Edge Intelligence, is beginning to receive a tremendous amount of interest from the machine learning community due to the ever increasing popularization of the Internet of Things (IoT). Unfortunately, the incorporation of AI characteristics to edge computing devices presents the drawbacks of being power and area hungry for typical machine learning techniques such as Convolutional Neural Networks (CNN). In this work, we propose a new power-and-area-efficient architecture for implementing Articial Neural Networks (ANNs) in hardware, based on the exploitation of correlation phenomenon in Stochastic Computing (SC) systems. The architecture purposed can solve the difficult implementation challenges that SC presents for CNN applications, such as the high resources used in binary-tostochastic conversion, the inaccuracy produced by undesired correlation between signals, and the stochastic maximum function implementation. Compared with traditional binary logic implementations, experimental results showed an improvement of 19.6x and 6.3x in terms of speed performance and energy efficiency, for the FPGA implementation. We have also realized a full VLSI implementation of the proposed SC-CNN architecture demonstrating that our optimization achieve a 18x area reduction over previous SC-DNN architecture VLSI implementation in a comparable technological node. For the first time, a fully-parallel CNN as LENET-5 is embedded and tested in a single FPGA, showing the benefits of using stochastic computing for embedded applications, in contrast to traditional binary logic implementations.
Abstract:Elementary cellular automata (ECA) is a widely studied one-dimensional processing methodology where the successive iteration of the automaton may lead to the recreation of a rich pattern dynamic. Recently, cellular automata have been proposed as a feasible way to implement Reservoir Computing (RC) systems in which the automata rule is fixed and the training is performed using a linear regression. In this work we perform an exhaustive study of the performance of the different ECA rules when applied to pattern recognition of time-independent input signals using a RC scheme. Once the different ECA rules have been tested, the most accurate one (rule 90) is selected to implement a digital circuit. Rule 90 is easily reproduced using a reduced set of XOR gates and shift-registers, thus representing a high-performance alternative for RC hardware implementation in terms of processing time, circuit area, power dissipation and system accuracy. The model (both in software and its hardware implementation) has been tested using a pattern recognition task of handwritten numbers (the MNIST database) for which we obtained competitive results in terms of accuracy, speed and power dissipation. The proposed model can be considered to be a low-cost method to implement fast pattern recognition digital circuits.