Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anirudh Itagi

AIRCHITECT v2: Learning the Hardware Accelerator Design Space through Unified Representations

Jan 17, 2025

Jamin Seo, Akshat Ramachandran, Yu-Chuan Chuang, Anirudh Itagi, Tushar Krishna

Abstract:Design space exploration (DSE) plays a crucial role in enabling custom hardware architectures, particularly for emerging applications like AI, where optimized and specialized designs are essential. With the growing complexity of deep neural networks (DNNs) and the introduction of advanced foundational models (FMs), the design space for DNN accelerators is expanding at an exponential rate. Additionally, this space is highly non-uniform and non-convex, making it increasingly difficult to navigate and optimize. Traditional DSE techniques rely on search-based methods, which involve iterative sampling of the design space to find the optimal solution. However, this process is both time-consuming and often fails to converge to the global optima for such design spaces. Recently, AIrchitect v1, the first attempt to address the limitations of search-based techniques, transformed DSE into a constant-time classification problem using recommendation networks. In this work, we propose AIrchitect v2, a more accurate and generalizable learning-based DSE technique applicable to large-scale design spaces that overcomes the shortcomings of earlier approaches. Specifically, we devise an encoder-decoder transformer model that (a) encodes the complex design space into a uniform intermediate representation using contrastive learning and (b) leverages a novel unified representation blending the advantages of classification and regression to effectively explore the large DSE space without sacrificing accuracy. Experimental results evaluated on 10^5 real DNN workloads demonstrate that, on average, AIrchitect v2 outperforms existing techniques by 15% in identifying optimal design points. Furthermore, to demonstrate the generalizability of our method, we evaluate performance on unseen model workloads (LLMs) and attain a 1.7x improvement in inference latency on the identified hardware architecture.

* Accepted to DATE 2025

Via

Access Paper or Ask Questions

Leveraging ASIC AI Chips for Homomorphic Encryption

Jan 13, 2025

Jianming Tong, Tianhao Huang, Leo de Castro, Anirudh Itagi, Jingtian Dang, Anupam Golder, Asra Ali, Jevin Jiang, Arvind, G. Edward Suh(+1 more)

Abstract:Cloud-based services are making the outsourcing of sensitive client data increasingly common. Although homomorphic encryption (HE) offers strong privacy guarantee, it requires substantially more resources than computing on plaintext, often leading to unacceptably large latencies in getting the results. HE accelerators have emerged to mitigate this latency issue, but with the high cost of ASICs. In this paper we show that HE primitives can be converted to AI operators and accelerated on existing ASIC AI accelerators, like TPUs, which are already widely deployed in the cloud. Adapting such accelerators for HE requires (1) supporting modular multiplication, (2) high-precision arithmetic in software, and (3) efficient mapping on matrix engines. We introduce the CROSS compiler (1) to adopt Barrett reduction to provide modular reduction support using multiplier and adder, (2) Basis Aligned Transformation (BAT) to convert high-precision multiplication as low-precision matrix-vector multiplication, (3) Matrix Aligned Transformation (MAT) to covert vectorized modular operation with reduction into matrix multiplication that can be efficiently processed on 2D spatial matrix engine. Our evaluation of CROSS on a Google TPUv4 demonstrates significant performance improvements, with up to 161x and 5x speedup compared to the previous work on many-core CPUs and V100. The kernel-level codes are open-sourced at https://github.com/google/jaxite.git.

* 16 pages, 10 figures, 4 algorithms, 7 tables. Enabling Google TPUv4 for privacy-preserving AI inference

Via

Access Paper or Ask Questions

Medicine Strip Identification using 2-D Cepstral Feature Extraction and Multiclass Classification Methods

Feb 03, 2020

Anirudh Itagi, Ritam Sil, Saurav Mohapatra, Subham Rout, Bharath K P, Karthik R, Rajesh Kumar Muthu

Figure 1 for Medicine Strip Identification using 2-D Cepstral Feature Extraction and Multiclass Classification Methods

Figure 2 for Medicine Strip Identification using 2-D Cepstral Feature Extraction and Multiclass Classification Methods

Figure 3 for Medicine Strip Identification using 2-D Cepstral Feature Extraction and Multiclass Classification Methods

Figure 4 for Medicine Strip Identification using 2-D Cepstral Feature Extraction and Multiclass Classification Methods

Abstract:Misclassification of medicine is perilous to the health of a patient, more so if the said patient is visually impaired or simply did not recognize the color, shape or type of medicine strip. This paper proposes a method for identification of medicine strips by 2-D cepstral analysis of their images followed by performing classification that has been done using the K-Nearest Neighbor (KNN), Support Vector Machine (SVM) and Logistic Regression (LR) Classifiers. The 2-D cepstral features extracted are extremely distinct to a medicine strip and consequently make identifying them exceptionally accurate. This paper also proposes the Color Gradient and Pill shape Feature (CGPF) extraction procedure and discusses the Binary Robust Invariant Scalable Keypoints (BRISK) algorithm as well. The mentioned algorithms were implemented and their identification results have been compared.

Via

Access Paper or Ask Questions