Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Neelesh Gupta

TabConv: Low-Computation CNN Inference via Table Lookups

Apr 08, 2024

Neelesh Gupta, Narayanan Kannan, Pengmiao Zhang, Viktor Prasanna

Abstract:Convolutional Neural Networks (CNNs) have demonstrated remarkable ability throughout the field of computer vision. However, CNN inference requires a large number of arithmetic operations, making them expensive to deploy in hardware. Current approaches alleviate this issue by developing hardware-supported, algorithmic processes to simplify spatial convolution functions. However, these methods still heavily rely on matrix multiplication, leading to significant computational overhead. To bridge the gap between hardware, algorithmic acceleration, and approximate matrix multiplication, we propose TabConv, a novel, table-based approximation for convolution to significantly reduce arithmetic operations during inference. Additionally, we introduce a priority masking technique based on cosine similarity to select layers for table-based approximation, thereby maintaining the model performance. We evaluate our approach on popular CNNs: ResNet-18, ResNet-34, and NetworkInNetwork (NIN). TabConv preserves over 93% of the original model's performance while reducing arithmetic operations by 36.5%, 25.8%, and 99.4% for ResNet-18 on CIFAR-10, CIFAR-100, and MNIST, respectively, 35.6% and 99.3% for ResNet-34 on CIFAR-10 and MNIST, and 98.9% for NIN on MNIST, achieving low-computation inference.

* 8 pages, Accepted at CF '24

Via

Access Paper or Ask Questions

PaCKD: Pattern-Clustered Knowledge Distillation for Compressing Memory Access Prediction Models

Feb 21, 2024

Neelesh Gupta, Pengmiao Zhang, Rajgopal Kannan, Viktor Prasanna

Figure 1 for PaCKD: Pattern-Clustered Knowledge Distillation for Compressing Memory Access Prediction Models

Figure 2 for PaCKD: Pattern-Clustered Knowledge Distillation for Compressing Memory Access Prediction Models

Figure 3 for PaCKD: Pattern-Clustered Knowledge Distillation for Compressing Memory Access Prediction Models

Figure 4 for PaCKD: Pattern-Clustered Knowledge Distillation for Compressing Memory Access Prediction Models

Abstract:Deep neural networks (DNNs) have proven to be effective models for accurate Memory Access Prediction (MAP), a critical task in mitigating memory latency through data prefetching. However, existing DNN-based MAP models suffer from the challenges such as significant physical storage space and poor inference latency, primarily due to their large number of parameters. These limitations render them impractical for deployment in real-world scenarios. In this paper, we propose PaCKD, a Pattern-Clustered Knowledge Distillation approach to compress MAP models while maintaining the prediction performance. The PaCKD approach encompasses three steps: clustering memory access sequences into distinct partitions involving similar patterns, training large pattern-specific teacher models for memory access prediction for each partition, and training a single lightweight student model by distilling the knowledge from the trained pattern-specific teachers. We evaluate our approach on LSTM, MLP-Mixer, and ResNet models, as they exhibit diverse structures and are widely used for image classification tasks in order to test their effectiveness in four widely used graph applications. Compared to the teacher models with 5.406M parameters and an F1-score of 0.4626, our student models achieve a 552$\times$ model size compression while maintaining an F1-score of 0.4538 (with a 1.92% performance drop). Our approach yields an 8.70% higher result compared to student models trained with standard knowledge distillation and an 8.88% higher result compared to student models trained without any form of knowledge distillation.

* 2023 IEEE High Performance Extreme Computing Conference (HPEC), 2023, pp. 1-7
* 6 pages, 2 figures, HPEC '23

Via

Access Paper or Ask Questions

Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching

Jan 16, 2024

Pengmiao Zhang, Neelesh Gupta, Rajgopal Kannan, Viktor K. Prasanna

Figure 1 for Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching

Figure 2 for Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching

Figure 3 for Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching

Figure 4 for Attention, Distillation, and Tabularization: Towards Practical Neural Network-Based Prefetching

Abstract:Attention-based Neural Networks (NN) have demonstrated their effectiveness in accurate memory access prediction, an essential step in data prefetching. However, the substantial computational overheads associated with these models result in high inference latency, limiting their feasibility as practical prefetchers. To close the gap, we propose a new approach based on tabularization that significantly reduces model complexity and inference latency without sacrificing prediction accuracy. Our novel tabularization methodology takes as input a distilled, yet highly accurate attention-based model for memory access prediction and efficiently converts its expensive matrix multiplications into a hierarchy of fast table lookups. As an exemplar of the above approach, we develop DART, a prefetcher comprised of a simple hierarchy of tables. With a modest 0.09 drop in F1-score, DART reduces 99.99% of arithmetic operations from the large attention-based model and 91.83% from the distilled model. DART accelerates the large model inference by 170x and the distilled model by 9.4x. DART has comparable latency and storage costs as state-of-the-art rule-based prefetcher BO but surpasses it by 6.1% in IPC improvement. DART outperforms state-of-the-art NN-based prefetchers TransFetch by 33.1% and Voyager by 37.2% in terms of IPC improvement, primarily due to its low prefetching latency.

Via

Access Paper or Ask Questions