Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Md Rubel Ahmed

EdgeProfiler: A Fast Profiling Framework for Lightweight LLMs on Edge Using Analytical Model

Jun 06, 2025

Alyssa Pinnock, Shakya Jayakody, Kawsher A Roxy, Md Rubel Ahmed

Abstract:This paper introduces EdgeProfiler, a fast profiling framework designed for evaluating lightweight Large Language Models (LLMs) on edge systems. While LLMs offer remarkable capabilities in natural language understanding and generation, their high computational, memory, and power requirements often confine them to cloud environments. EdgeProfiler addresses these challenges by providing a systematic methodology for assessing LLM performance in resource-constrained edge settings. The framework profiles compact LLMs, including TinyLLaMA, Gemma3.1B, Llama3.2-1B, and DeepSeek-r1-1.5B, using aggressive quantization techniques and strict memory constraints. Analytical modeling is used to estimate latency, FLOPs, and energy consumption. The profiling reveals that 4-bit quantization reduces model memory usage by approximately 60-70%, while maintaining accuracy within 2-5% of full-precision baselines. Inference speeds are observed to improve by 2-3x compared to FP16 baselines across various edge devices. Power modeling estimates a 35-50% reduction in energy consumption for INT4 configurations, enabling practical deployment on hardware such as Raspberry Pi 4/5 and Jetson Orin Nano Super. Our findings emphasize the importance of efficient profiling tailored to lightweight LLMs in edge environments, balancing accuracy, energy efficiency, and computational feasibility.

* 4 figures, 7 pages, IEEE conference template

Via

Access Paper or Ask Questions

Data Augmentation for Image Classification using Generative AI

Aug 31, 2024

Fazle Rahat, M Shifat Hossain, Md Rubel Ahmed, Sumit Kumar Jha, Rickard Ewetz

Abstract:Scaling laws dictate that the performance of AI models is proportional to the amount of available data. Data augmentation is a promising solution to expanding the dataset size. Traditional approaches focused on augmentation using rotation, translation, and resizing. Recent approaches use generative AI models to improve dataset diversity. However, the generative methods struggle with issues such as subject corruption and the introduction of irrelevant artifacts. In this paper, we propose the Automated Generative Data Augmentation (AGA). The framework combines the utility of large language models (LLMs), diffusion models, and segmentation models to augment data. AGA preserves foreground authenticity while ensuring background diversity. Specific contributions include: i) segment and superclass based object extraction, ii) prompt diversity with combinatorial complexity using prompt decomposition, and iii) affine subject manipulation. We evaluate AGA against state-of-the-art (SOTA) techniques on three representative datasets, ImageNet, CUB, and iWildCam. The experimental evaluation demonstrates an accuracy improvement of 15.6% and 23.5% for in and out-of-distribution data compared to baseline models, respectively. There is also a 64.3% improvement in SIC score compared to the baselines.

* 19 pages, 15 figures, 4 tables

Via

Access Paper or Ask Questions

AutoHLS: Learning to Accelerate Design Space Exploration for HLS Designs

Mar 15, 2024

Md Rubel Ahmed, Toshiaki Koike-Akino, Kieran Parsons, Ye Wang

Abstract:High-level synthesis (HLS) is a design flow that leverages modern language features and flexibility, such as complex data structures, inheritance, templates, etc., to prototype hardware designs rapidly. However, exploring various design space parameters can take much time and effort for hardware engineers to meet specific design specifications. This paper proposes a novel framework called AutoHLS, which integrates a deep neural network (DNN) with Bayesian optimization (BO) to accelerate HLS hardware design optimization. Our tool focuses on HLS pragma exploration and operation transformation. It utilizes integrated DNNs to predict synthesizability within a given FPGA resource budget. We also investigate the potential of emerging quantum neural networks (QNNs) instead of classical DNNs for the AutoHLS pipeline. Our experimental results demonstrate up to a 70-fold speedup in exploration time.

* 5 pages, 6 figures, MWSCAS 2023

Via

Access Paper or Ask Questions

Mining SoC Message Flows with Attention Model

Sep 12, 2022

Md Rubel Ahmed, Bardia Nadimi, Hao Zheng

Figure 1 for Mining SoC Message Flows with Attention Model

Figure 2 for Mining SoC Message Flows with Attention Model

Figure 3 for Mining SoC Message Flows with Attention Model

Figure 4 for Mining SoC Message Flows with Attention Model

Abstract:High-quality system-level message flow specifications are necessary for comprehensive validation of system-on-chip (SoC) designs. However, manual development and maintenance of such specifications are daunting tasks. We propose a disruptive method that utilizes deep sequence modeling with the attention mechanism to infer accurate flow specifications from SoC communication traces. The proposed method can overcome the inherent complexity of SoC traces induced by the concurrent executions of SoC designs that existing mining tools often find extremely challenging. We conduct experiments on five highly concurrent traces and find that the proposed approach outperforms several existing state-of-the-art trace mining tools.

* 7 pages

Via

Access Paper or Ask Questions

Deep Bidirectional Transformers for SoC Flow Specification Mining

Mar 09, 2022

Md Rubel Ahmed, Hao Zheng

Figure 1 for Deep Bidirectional Transformers for SoC Flow Specification Mining

Figure 2 for Deep Bidirectional Transformers for SoC Flow Specification Mining

Figure 3 for Deep Bidirectional Transformers for SoC Flow Specification Mining

Abstract:High-quality system-level message flow specifications can lead to comprehensive validation of system-on-chip (SoC) designs. We propose a disruptive method that utilizes an attention mechanism to produce accurate flow specifications from SoC IP communication traces. The proposed method can overcome the inherent complexity of SoC traces induced by the concurrency and parallelism of multicore designs that existing flow specification mining tools often find extremely challenging. Experiments on highly interleaved traces show promising flow reconstruction compared to several tools dedicated to the flow specification mining problem.

* 2 pages short paper

Via

Access Paper or Ask Questions

Model Synthesis for Communication Traces of System-on-Chip Designs

Feb 13, 2021

Hao Zheng, Md Rubel Ahmed, Parijat Mukherjee, Mahesh C. Ketkar, Jin Yang

Figure 1 for Model Synthesis for Communication Traces of System-on-Chip Designs

Figure 2 for Model Synthesis for Communication Traces of System-on-Chip Designs

Figure 3 for Model Synthesis for Communication Traces of System-on-Chip Designs

Figure 4 for Model Synthesis for Communication Traces of System-on-Chip Designs

Abstract:Concise and abstract models of system-level behaviors are invaluable in design analysis, testing, and validation. In this paper, we consider the problem of inferring models from communication traces of system-on-chip~(SoC) designs. The traces capture communications among different blocks of a SoC design in terms of messages exchanged. The extracted models characterize the system-level communication protocols governing how blocks exchange messages, and coordinate with each other to realize various system functions. In this paper, the above problem is formulated as a constraint satisfaction problem, which is then fed to a SMT solver. The solutions returned by the SMT solver are used to extract the models that accept the input traces. In the experiments, we demonstrate the proposed approach with traces collected from a transaction-level simulation model of a multicore SoC design and traces of a more detailed multicore SoC design developed in GEM5 environment.

Via

Access Paper or Ask Questions