Abstract:The increasing demand for processing large volumes of data for machine learning models has pushed data bandwidth requirements beyond the capability of traditional von Neumann architecture. In-memory computing (IMC) has recently emerged as a promising solution to address this gap by enabling distributed data storage and processing at the micro-architectural level, significantly reducing both latency and energy. In this paper, we present the IMPACT: InMemory ComPuting Architecture Based on Y-FlAsh Technology for Coalesced Tsetlin Machine Inference, underpinned on a cutting-edge memory device, Y-Flash, fabricated on a 180 nm CMOS process. Y-Flash devices have recently been demonstrated for digital and analog memory applications, offering high yield, non-volatility, and low power consumption. The IMPACT leverages the Y-Flash array to implement the inference of a novel machine learning algorithm: coalesced Tsetlin machine (CoTM) based on propositional logic. CoTM utilizes Tsetlin automata (TA) to create Boolean feature selections stochastically across parallel clauses. The IMPACT is organized into two computational crossbars for storing the TA and weights. Through validation on the MNIST dataset, IMPACT achieved 96.3% accuracy. The IMPACT demonstrated improvements in energy efficiency, e.g., 2.23X over CNN-based ReRAM, 2.46X over Neuromorphic using NOR-Flash, and 2.06X over DNN-based PCM, suited for modern ML inference applications.
Abstract:The modern implementation of machine learning architectures faces significant challenges due to frequent data transfer between memory and processing units. In-memory computing, primarily through memristor-based analog computing, offers a promising solution to overcome this von Neumann bottleneck. In this technology, data processing and storage are located inside the memory. Here, we introduce a novel approach that utilizes floating-gate Y-Flash memristive devices manufactured with a standard 180 nm CMOS process. These devices offer attractive features, including analog tunability and moderate device-to-device variation; such characteristics are essential for reliable decision-making in ML applications. This paper uses a new machine learning algorithm, the Tsetlin Machine (TM), for in-memory processing architecture. The TM's learning element, Automaton, is mapped into a single Y-Flash cell, where the Automaton's range is transferred into the Y-Flash's conductance scope. Through comprehensive simulations, the proposed hardware implementation of the learning automata, particularly for Tsetlin machines, has demonstrated enhanced scalability and on-edge learning capabilities.
Abstract:In-memory computing for Machine Learning (ML) applications remedies the von Neumann bottlenecks by organizing computation to exploit parallelism and locality. Non-volatile memory devices such as Resistive RAM (ReRAM) offer integrated switching and storage capabilities showing promising performance for ML applications. However, ReRAM devices have design challenges, such as non-linear digital-analog conversion and circuit overheads. This paper proposes an In-Memory Boolean-to-Current Inference Architecture (IMBUE) that uses ReRAM-transistor cells to eliminate the need for such conversions. IMBUE processes Boolean feature inputs expressed as digital voltages and generates parallel current paths based on resistive memory states. The proportional column current is then translated back to the Boolean domain for further digital processing. The IMBUE architecture is inspired by the Tsetlin Machine (TM), an emerging ML algorithm based on intrinsically Boolean logic. The IMBUE architecture demonstrates significant performance improvements over binarized convolutional neural networks and digital TM in-memory implementations, achieving up to a 12.99x and 5.28x increase, respectively.