Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jan Moritz Joseph

A Calibratable Model for Fast Energy Estimation of MVM Operations on RRAM Crossbars

May 07, 2024

José Cubero-Cascante, Arunkumar Vaidyanathan, Rebecca Pelke, Lorenzo Pfeifer, Rainer Leupers, Jan Moritz Joseph

Figure 1 for A Calibratable Model for Fast Energy Estimation of MVM Operations on RRAM Crossbars

Figure 2 for A Calibratable Model for Fast Energy Estimation of MVM Operations on RRAM Crossbars

Figure 3 for A Calibratable Model for Fast Energy Estimation of MVM Operations on RRAM Crossbars

Figure 4 for A Calibratable Model for Fast Energy Estimation of MVM Operations on RRAM Crossbars

Abstract:The surge in AI usage demands innovative power reduction strategies. Novel Compute-in-Memory (CIM) architectures, leveraging advanced memory technologies, hold the potential for significantly lowering energy consumption by integrating storage with parallel Matrix-Vector-Multiplications (MVMs). This study addresses the 1T1R RRAM crossbar, a core component in numerous CIM architectures. We introduce an abstract model and a calibration methodology for estimating operational energy. Our tool condenses circuit-level behaviour into a few parameters, facilitating energy assessments for DNN workloads. Validation against low-level SPICE simulations demonstrates speedups of up to 1000x and energy estimations with errors below 1%.

* Presented at AICAS 2024, Abu Dhabi, UAE. 5 pages, 6 figures

Via

Access Paper or Ask Questions

CLSA-CIM: A Cross-Layer Scheduling Approach for Computing-in-Memory Architectures

Jan 17, 2024

Rebecca Pelke, Jose Cubero-Cascante, Nils Bosbach, Felix Staudigl, Rainer Leupers, Jan Moritz Joseph

Abstract:The demand for efficient machine learning (ML) accelerators is growing rapidly, driving the development of novel computing concepts such as resistive random access memory (RRAM)-based tiled computing-in-memory (CIM) architectures. CIM allows to compute within the memory unit, resulting in faster data processing and reduced power consumption. Efficient compiler algorithms are essential to exploit the potential of tiled CIM architectures. While conventional ML compilers focus on code generation for CPUs, GPUs, and other von Neumann architectures, adaptations are needed to cover CIM architectures. Cross-layer scheduling is a promising approach, as it enhances the utilization of CIM cores, thereby accelerating computations. Although similar concepts are implicitly used in previous work, there is a lack of clear and quantifiable algorithmic definitions for cross-layer scheduling for tiled CIM architectures. To close this gap, we present CLSA-CIM, a cross-layer scheduling algorithm for tiled CIM architectures. We integrate CLSA-CIM with existing weight-mapping strategies and compare performance against state-of-the-art (SOTA) scheduling algorithms. CLSA-CIM improves the utilization by up to 17.9 x , resulting in an overall speedup increase of up to 29.2 x compared to SOTA.

Via

Access Paper or Ask Questions

AIRCHITECT: Learning Custom Architecture Design and Mapping Space

Aug 16, 2021

Ananda Samajdar, Jan Moritz Joseph, Matthew Denton, Tushar Krishna

Figure 1 for AIRCHITECT: Learning Custom Architecture Design and Mapping Space

Figure 2 for AIRCHITECT: Learning Custom Architecture Design and Mapping Space

Figure 3 for AIRCHITECT: Learning Custom Architecture Design and Mapping Space

Figure 4 for AIRCHITECT: Learning Custom Architecture Design and Mapping Space

Abstract:Design space exploration is an important but costly step involved in the design/deployment of custom architectures to squeeze out maximum possible performance and energy efficiency. Conventionally, optimizations require iterative sampling of the design space using simulation or heuristic tools. In this paper we investigate the possibility of learning the optimization task using machine learning and hence using the learnt model to predict optimal parameters for the design and mapping space of custom architectures, bypassing any exploration step. We use three case studies involving the optimal array design, SRAM buffer sizing, mapping, and schedule determination for systolic-array-based custom architecture design and mapping space. Within the purview of these case studies, we show that it is possible to capture the design space and train a model to "generalize" prediction the optimal design and mapping parameters when queried with workload and design constraints. We perform systematic design-aware and statistical analysis of the optimization space for our case studies and highlight the patterns in the design space. We formulate the architecture design and mapping as a machine learning problem that allows us to leverage existing ML models for training and inference. We design and train a custom network architecture called AIRCHITECT, which is capable of learning the architecture design space with as high as 94.3% test accuracy and predicting optimal configurations which achieve on average (GeoMean) of 99.9% the best possible performance on a test dataset with $10^5$ GEMM workloads.

Via

Access Paper or Ask Questions