Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Siddharth Garg

NYU Tandon School of Engineering

VeriLoC: Line-of-Code Level Prediction of Hardware Design Quality from Verilog Code

Jun 08, 2025

Raghu Vamshi Hemadri, Jitendra Bhandari, Johann Knechtel, Badri P Gopalan, Ramesh Narayanaswamy, Ramesh Karri, Siddharth Garg

Abstract:Modern chip design is complex, and there is a crucial need for early-stage prediction of key design-quality metrics like timing and routing congestion directly from Verilog code (a commonly used programming language for hardware design). It is especially important yet complex to predict individual lines of code that cause timing violations or downstream routing congestion. Prior works have tried approaches like converting Verilog into an intermediate graph representation and using LLM embeddings alongside other features to predict module-level quality, but did not consider line-level quality prediction. We propose VeriLoC, the first method that predicts design quality directly from Verilog at both the line- and module-level. To this end, VeriLoC leverages recent Verilog code-generation LLMs to extract local line-level and module-level embeddings, and train downstream classifiers/regressors on concatenations of these embeddings. VeriLoC achieves high F1-scores of 0.86-0.95 for line-level congestion and timing prediction, and reduces the mean average percentage error from 14% - 18% for SOTA methods down to only 4%. We believe that VeriLoC embeddings and insights from our work will also be of value for other predictive and optimization tasks for complex hardware design.

Via

Access Paper or Ask Questions

Can Reasoning Models Reason about Hardware? An Agentic HLS Perspective

Mar 17, 2025

Luca Collini, Andrew Hennessee, Ramesh Karri, Siddharth Garg

Abstract:Recent Large Language Models (LLMs) such as OpenAI o3-mini and DeepSeek-R1 use enhanced reasoning through Chain-of-Thought (CoT). Their potential in hardware design, which relies on expert-driven iterative optimization, remains unexplored. This paper investigates whether reasoning LLMs can address challenges in High-Level Synthesis (HLS) design space exploration and optimization. During HLS, engineers manually define pragmas/directives to balance performance and resource constraints. We propose an LLM-based optimization agentic framework that automatically restructures code, inserts pragmas, and identifies optimal design points via feedback from HLs tools and access to integer-linear programming (ILP) solvers. Experiments compare reasoning models against conventional LLMs on benchmarks using success rate, efficiency, and design quality (area/latency) metrics, and provide the first-ever glimpse into the CoTs produced by a powerful open-source reasoning model like DeepSeek-R1.

* 7 pages, submitted for peer review

Via

Access Paper or Ask Questions

Huff-LLM: End-to-End Lossless Compression for Efficient LLM Inference

Feb 02, 2025

Patrick Yubeaton, Tareq Mahmoud, Shehab Naga, Pooria Taheri, Tianhua Xia, Arun George, Yasmein Khalil, Sai Qian Zhang, Siddharth Joshi, Chinmay Hegde(+1 more)

Abstract:As they become more capable, large language models (LLMs) have continued to rapidly increase in size. This has exacerbated the difficulty in running state of the art LLMs on small, edge devices. Standard techniques advocate solving this problem through lossy compression techniques such as quantization or pruning. However, such compression techniques are lossy, and have been shown to change model behavior in unpredictable manners. We propose Huff-LLM, an \emph{end-to-end, lossless} model compression method that lets users store LLM weights in compressed format \emph{everywhere} -- cloud, disk, main memory, and even in on-chip memory/buffers. This allows us to not only load larger models in main memory, but also reduces bandwidth required to load weights on chip, and makes more efficient use of on-chip weight buffers. In addition to the memory savings achieved via compression, we also show latency and energy efficiency improvements when performing inference with the compressed model.

Via

Access Paper or Ask Questions

Towards Unified Benchmark and Models for Multi-Modal Perceptual Metrics

Dec 13, 2024

Sara Ghazanfari, Siddharth Garg, Nicolas Flammarion, Prashanth Krishnamurthy, Farshad Khorrami, Francesco Croce

Figure 1 for Towards Unified Benchmark and Models for Multi-Modal Perceptual Metrics

Figure 2 for Towards Unified Benchmark and Models for Multi-Modal Perceptual Metrics

Figure 3 for Towards Unified Benchmark and Models for Multi-Modal Perceptual Metrics

Figure 4 for Towards Unified Benchmark and Models for Multi-Modal Perceptual Metrics

Abstract:Human perception of similarity across uni- and multimodal inputs is highly complex, making it challenging to develop automated metrics that accurately mimic it. General purpose vision-language models, such as CLIP and large multi-modal models (LMMs), can be applied as zero-shot perceptual metrics, and several recent works have developed models specialized in narrow perceptual tasks. However, the extent to which existing perceptual metrics align with human perception remains unclear. To investigate this question, we introduce UniSim-Bench, a benchmark encompassing 7 multi-modal perceptual similarity tasks, with a total of 25 datasets. Our evaluation reveals that while general-purpose models perform reasonably well on average, they often lag behind specialized models on individual tasks. Conversely, metrics fine-tuned for specific tasks fail to generalize well to unseen, though related, tasks. As a first step towards a unified multi-task perceptual similarity metric, we fine-tune both encoder-based and generative vision-language models on a subset of the UniSim-Bench tasks. This approach yields the highest average performance, and in some cases, even surpasses taskspecific models. Nevertheless, these models still struggle with generalization to unseen tasks, highlighting the ongoing challenge of learning a robust, unified perceptual similarity metric capable of capturing the human notion of similarity. The code and models are available at https://github.com/SaraGhazanfari/UniSim.

Via

Access Paper or Ask Questions

Out-of-Distribution Detection with Overlap Index

Dec 09, 2024

Hao Fu, Prashanth Krishnamurthy, Siddharth Garg, Farshad Khorrami

Figure 1 for Out-of-Distribution Detection with Overlap Index

Figure 2 for Out-of-Distribution Detection with Overlap Index

Figure 3 for Out-of-Distribution Detection with Overlap Index

Figure 4 for Out-of-Distribution Detection with Overlap Index

Abstract:Out-of-distribution (OOD) detection is crucial for the deployment of machine learning models in the open world. While existing OOD detectors are effective in identifying OOD samples that deviate significantly from in-distribution (ID) data, they often come with trade-offs. For instance, deep OOD detectors usually suffer from high computational costs, require tuning hyperparameters, and have limited interpretability, whereas traditional OOD detectors may have a low accuracy on large high-dimensional datasets. To address these limitations, we propose a novel effective OOD detection approach that employs an overlap index (OI)-based confidence score function to evaluate the likelihood of a given input belonging to the same distribution as the available ID samples. The proposed OI-based confidence score function is non-parametric, lightweight, and easy to interpret, hence providing strong flexibility and generality. Extensive empirical evaluations indicate that our OI-based OOD detector is competitive with state-of-the-art OOD detectors in terms of detection accuracy on a wide range of datasets while requiring less computation and memory costs. Lastly, we show that the proposed OI-based confidence score function inherits nice properties from OI (e.g., insensitivity to small distributional variations and robustness against Huber $\epsilon$-contamination) and is a versatile tool for estimating OI and model accuracy in specific contexts.

Via

Access Paper or Ask Questions

PrefixLLM: LLM-aided Prefix Circuit Design

Dec 03, 2024

Weihua Xiao, Venkata Sai Charan Putrevu, Raghu Vamshi Hemadri, Siddharth Garg, Ramesh Karri

Figure 1 for PrefixLLM: LLM-aided Prefix Circuit Design

Figure 2 for PrefixLLM: LLM-aided Prefix Circuit Design

Figure 3 for PrefixLLM: LLM-aided Prefix Circuit Design

Figure 4 for PrefixLLM: LLM-aided Prefix Circuit Design

Abstract:Prefix circuits are fundamental components in digital adders, widely used in digital systems due to their efficiency in calculating carry signals. Synthesizing prefix circuits with minimized area and delay is crucial for enhancing the performance of modern computing systems. Recently, large language models (LLMs) have demonstrated a surprising ability to perform text generation tasks. We propose PrefixLLM, that leverages LLMs for prefix circuit synthesis. PrefixLLM transforms the prefix circuit synthesis task into a structured text generation problem, termed the Structured Prefix Circuit Representation (SPCR), and introduces an iterative framework to automatically and accurately generate valid SPCRs. We further present a design space exploration (DSE) framework that uses LLMs to iteratively search for area and delay optimized prefix circuits. Compared to state-of-the-art, PrefixLLM can reduce the area by 3.70% under the same delay constraint. This work highlights the use of LLMs in the synthesis of arithmetic circuits, which can be transformed into the structured text generation.

Via

Access Paper or Ask Questions

TruncFormer: Private LLM Inference Using Only Truncations

Dec 02, 2024

Patrick Yubeaton, Jianqiao Cambridge Mo, Karthik Garimella, Nandan Kumar Jha, Brandon Reagen, Chinmay Hegde, Siddharth Garg

Figure 1 for TruncFormer: Private LLM Inference Using Only Truncations

Figure 2 for TruncFormer: Private LLM Inference Using Only Truncations

Figure 3 for TruncFormer: Private LLM Inference Using Only Truncations

Figure 4 for TruncFormer: Private LLM Inference Using Only Truncations

Abstract:Private inference (PI) serves an important role in guaranteeing the privacy of user data when interfacing with proprietary machine learning models such as LLMs. However, PI remains practically intractable due to the massive latency costs associated with nonlinear functions present in LLMs. Existing works have focused on improving latency of specific LLM nonlinearities (such as the Softmax, or the GeLU) via approximations. However, new types of nonlinearities are regularly introduced with new LLM architectures, and this has led to a constant game of catch-up where PI researchers attempt to optimize the newest nonlinear function. We introduce TruncFormer, a framework for taking any LLM and transforming it into a plaintext emulation of PI. Our framework leverages the fact that nonlinearities in LLMs are differentiable and can be accurately approximated with a sequence of additions, multiplications, and truncations. Further, we decouple the add/multiply and truncation operations, and statically determine where truncations should be inserted based on a given field size and input representation size. This leads to latency improvements over existing cryptographic protocols that enforce truncation after every multiplication operation. We open source our code for community use.

Via

Access Paper or Ask Questions

EMMA: Efficient Visual Alignment in Multi-Modal LLMs

Oct 02, 2024

Sara Ghazanfari, Alexandre Araujo, Prashanth Krishnamurthy, Siddharth Garg, Farshad Khorrami

Figure 1 for EMMA: Efficient Visual Alignment in Multi-Modal LLMs

Figure 2 for EMMA: Efficient Visual Alignment in Multi-Modal LLMs

Figure 3 for EMMA: Efficient Visual Alignment in Multi-Modal LLMs

Figure 4 for EMMA: Efficient Visual Alignment in Multi-Modal LLMs

Abstract:Multi-modal Large Language Models (MLLMs) have recently exhibited impressive general-purpose capabilities by leveraging vision foundation models to encode the core concepts of images into representations. These are then combined with instructions and processed by the language model to generate high-quality responses. Despite significant progress in enhancing the language component, challenges persist in optimally fusing visual encodings within the language model for task-specific adaptability. Recent research has focused on improving this fusion through modality adaptation modules but at the cost of significantly increased model complexity and training data needs. In this paper, we propose EMMA (Efficient Multi-Modal Adaptation), a lightweight cross-modality module designed to efficiently fuse visual and textual encodings, generating instruction-aware visual representations for the language model. Our key contributions include: (1) an efficient early fusion mechanism that integrates vision and language representations with minimal added parameters (less than 0.2% increase in model size), (2) an in-depth interpretability analysis that sheds light on the internal mechanisms of the proposed method; (3) comprehensive experiments that demonstrate notable improvements on both specialized and general benchmarks for MLLMs. Empirical results show that EMMA boosts performance across multiple tasks by up to 9.3% while significantly improving robustness against hallucinations. Our code is available at https://github.com/SaraGhazanfari/EMMA

Via

Access Paper or Ask Questions

Uncertainty-Aware Deep Neural Representations for Visual Analysis of Vector Field Data

Jul 23, 2024

Atul Kumar, Siddharth Garg, Soumya Dutta

Figure 1 for Uncertainty-Aware Deep Neural Representations for Visual Analysis of Vector Field Data

Figure 2 for Uncertainty-Aware Deep Neural Representations for Visual Analysis of Vector Field Data

Figure 3 for Uncertainty-Aware Deep Neural Representations for Visual Analysis of Vector Field Data

Figure 4 for Uncertainty-Aware Deep Neural Representations for Visual Analysis of Vector Field Data

Abstract:The widespread use of Deep Neural Networks (DNNs) has recently resulted in their application to challenging scientific visualization tasks. While advanced DNNs demonstrate impressive generalization abilities, understanding factors like prediction quality, confidence, robustness, and uncertainty is crucial. These insights aid application scientists in making informed decisions. However, DNNs lack inherent mechanisms to measure prediction uncertainty, prompting the creation of distinct frameworks for constructing robust uncertainty-aware models tailored to various visualization tasks. In this work, we develop uncertainty-aware implicit neural representations to model steady-state vector fields effectively. We comprehensively evaluate the efficacy of two principled deep uncertainty estimation techniques: (1) Deep Ensemble and (2) Monte Carlo Dropout, aimed at enabling uncertainty-informed visual analysis of features within steady vector field data. Our detailed exploration using several vector data sets indicate that uncertainty-aware models generate informative visualization results of vector field features. Furthermore, incorporating prediction uncertainty improves the resilience and interpretability of our DNN model, rendering it applicable for the analysis of non-trivial vector field data sets.

* Accepted for publication at IEEE Visualization 2024

Via

Access Paper or Ask Questions

Rome was Not Built in a Single Step: Hierarchical Prompting for LLM-based Chip Design

Jul 23, 2024

Andre Nakkab, Sai Qian Zhang, Ramesh Karri, Siddharth Garg

Figure 1 for Rome was Not Built in a Single Step: Hierarchical Prompting for LLM-based Chip Design

Figure 2 for Rome was Not Built in a Single Step: Hierarchical Prompting for LLM-based Chip Design

Figure 3 for Rome was Not Built in a Single Step: Hierarchical Prompting for LLM-based Chip Design

Figure 4 for Rome was Not Built in a Single Step: Hierarchical Prompting for LLM-based Chip Design

Abstract:Large Language Models (LLMs) are effective in computer hardware synthesis via hardware description language (HDL) generation. However, LLM-assisted approaches for HDL generation struggle when handling complex tasks. We introduce a suite of hierarchical prompting techniques which facilitate efficient stepwise design methods, and develop a generalizable automation pipeline for the process. To evaluate these techniques, we present a benchmark set of hardware designs which have solutions with or without architectural hierarchy. Using these benchmarks, we compare various open-source and proprietary LLMs, including our own fine-tuned Code Llama-Verilog model. Our hierarchical methods automatically produce successful designs for complex hardware modules that standard flat prompting methods cannot achieve, allowing smaller open-source LLMs to compete with large proprietary models. Hierarchical prompting reduces HDL generation time and yields savings on LLM costs. Our experiments detail which LLMs are capable of which applications, and how to apply hierarchical methods in various modes. We explore case studies of generating complex cores using automatic scripted hierarchical prompts, including the first-ever LLM-designed processor with no human feedback.

* Accepted at MLCAD '24. 10 pages, 7 figures, 5 tables

Via

Access Paper or Ask Questions