Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jishen Zhao

PRO-V: An Efficient Program Generation Multi-Agent System for Automatic RTL Verification

Jun 13, 2025

Yujie Zhao, Zhijing Wu, Hejia Zhang, Zhongming Yu, Wentao Ni, Chia-Tung Ho, Haoxing Ren, Jishen Zhao

Abstract:LLM-assisted hardware verification is gaining substantial attention due to its potential to significantly reduce the cost and effort of crafting effective testbenches. It also serves as a critical enabler for LLM-aided end-to-end hardware language design. However, existing current LLMs often struggle with Register Transfer Level (RTL) code generation, resulting in testbenches that exhibit functional errors in Hardware Description Languages (HDL) logic. Motivated by the strong performance of LLMs in Python code generation under inference-time sampling strategies, and their promising capabilities as judge agents, we propose PRO-V a fully program generation multi-agent system for robust RTL verification. Pro-V incorporates an efficient best-of-n iterative sampling strategy to enhance the correctness of generated testbenches. Moreover, it introduces an LLM-as-a-judge aid validation framework featuring an automated prompt generation pipeline. By converting rule-based static analysis from the compiler into natural language through in-context learning, this pipeline enables LLMs to assist the compiler in determining whether verification failures stem from errors in the RTL design or the testbench. PRO-V attains a verification accuracy of 87.17% on golden RTL implementations and 76.28% on RTL mutants. Our code is open-sourced at https://github.com/stable-lab/Pro-V.

Via

Access Paper or Ask Questions

SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters

Feb 11, 2025

Yiping Wang, Hanxian Huang, Yifang Chen, Jishen Zhao, Simon Shaolei Du, Yuandong Tian

Figure 1 for SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters

Figure 2 for SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters

Figure 3 for SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters

Figure 4 for SHARP: Accelerating Language Model Inference by SHaring Adjacent layers with Recovery Parameters

Abstract:While Large language models (LLMs) have advanced natural language processing tasks, their growing computational and memory demands make deployment on resource-constrained devices like mobile phones increasingly challenging. In this paper, we propose SHARP (SHaring Adjacent Layers with Recovery Parameters), a novel approach to accelerate LLM inference by sharing parameters across adjacent layers, thus reducing memory load overhead, while introducing low-rank recovery parameters to maintain performance. Inspired by observations that consecutive layers have similar outputs, SHARP employs a two-stage recovery process: Single Layer Warmup (SLW), and Supervised Fine-Tuning (SFT). The SLW stage aligns the outputs of the shared layers using L_2 loss, providing a good initialization for the following SFT stage to further restore the model performance. Extensive experiments demonstrate that SHARP can recover the model's perplexity on various in-distribution tasks using no more than 50k fine-tuning data while reducing the number of stored MLP parameters by 38% to 65%. We also conduct several ablation studies of SHARP and show that replacing layers towards the later parts of the model yields better performance retention, and that different recovery parameterizations perform similarly when parameter counts are matched. Furthermore, SHARP saves 42.8% in model storage and reduces the total inference time by 42.2% compared to the original Llama2-7b model on mobile devices. Our results highlight SHARP as an efficient solution for reducing inference costs in deploying LLMs without the need for pretraining-scale resources.

* 24 pages

Via

Access Paper or Ask Questions

MAGE: A Multi-Agent Engine for Automated RTL Code Generation

Dec 10, 2024

Yujie Zhao, Hejia Zhang, Hanxian Huang, Zhongming Yu, Jishen Zhao

Figure 1 for MAGE: A Multi-Agent Engine for Automated RTL Code Generation

Figure 2 for MAGE: A Multi-Agent Engine for Automated RTL Code Generation

Figure 3 for MAGE: A Multi-Agent Engine for Automated RTL Code Generation

Figure 4 for MAGE: A Multi-Agent Engine for Automated RTL Code Generation

Abstract:The automatic generation of RTL code (e.g., Verilog) through natural language instructions has emerged as a promising direction with the advancement of large language models (LLMs). However, producing RTL code that is both syntactically and functionally correct remains a significant challenge. Existing single-LLM-agent approaches face substantial limitations because they must navigate between various programming languages and handle intricate generation, verification, and modification tasks. To address these challenges, this paper introduces MAGE, the first open-source multi-agent AI system designed for robust and accurate Verilog RTL code generation. We propose a novel high-temperature RTL candidate sampling and debugging system that effectively explores the space of code candidates and significantly improves the quality of the candidates. Furthermore, we design a novel Verilog-state checkpoint checking mechanism that enables early detection of functional errors and delivers precise feedback for targeted fixes, significantly enhancing the functional correctness of the generated RTL code. MAGE achieves a 95.7% rate of syntactic and functional correctness code generation on VerilogEval-Human 2 benchmark, surpassing the state-of-the-art Claude-3.5-sonnet by 23.3 %, demonstrating a robust and reliable approach for AI-driven RTL design workflows.

* 7 pages, 4 figures

Via

Access Paper or Ask Questions

A Step Towards Mixture of Grader: Statistical Analysis of Existing Automatic Evaluation Metrics

Oct 13, 2024

Yun Joon Soh, Jishen Zhao

Figure 1 for A Step Towards Mixture of Grader: Statistical Analysis of Existing Automatic Evaluation Metrics

Figure 2 for A Step Towards Mixture of Grader: Statistical Analysis of Existing Automatic Evaluation Metrics

Figure 3 for A Step Towards Mixture of Grader: Statistical Analysis of Existing Automatic Evaluation Metrics

Figure 4 for A Step Towards Mixture of Grader: Statistical Analysis of Existing Automatic Evaluation Metrics

Abstract:The explosion of open-sourced models and Question-Answering (QA) datasets emphasizes the importance of automated QA evaluation. We studied the statistics of the existing evaluation metrics for a better understanding of their limitations. By measuring the correlation coefficients of each evaluation metric concerning human-like evaluation score, we observed the following: (1) existing metrics have a high correlation among them concerning the question type (e.g., single word, single phrase, etc.), (2) no single metric can adequately estimate the human-like evaluation. As a potential solution, we discuss how a Mixture Of Grader could potentially improve the auto QA evaluator quality.

Via

Access Paper or Ask Questions

Grounding Large Language Models In Embodied Environment With Imperfect World Models

Oct 03, 2024

Haolan Liu, Jishen Zhao

Figure 1 for Grounding Large Language Models In Embodied Environment With Imperfect World Models

Figure 2 for Grounding Large Language Models In Embodied Environment With Imperfect World Models

Figure 3 for Grounding Large Language Models In Embodied Environment With Imperfect World Models

Figure 4 for Grounding Large Language Models In Embodied Environment With Imperfect World Models

Abstract:Despite a widespread success in various applications, large language models (LLMs) often stumble when tackling basic physical reasoning or executing robotics tasks, due to a lack of direct experience with the physical nuances of the real world. To address these issues, we propose a Grounding Large language model with Imperfect world MOdel (GLIMO), which utilizes proxy world models such as simulators to collect and synthesize trining data. GLIMO incorporates an LLM agent-based data generator to automatically create high-quality and diverse instruction datasets. The generator includes an iterative self-refining module for temporally consistent experience sampling, a diverse set of question-answering instruction seeds, and a retrieval-augmented generation module for reflecting on prior experiences. Comprehensive experiments show that our approach improve the performance of strong open-source LLMs like LLaMA-3 with a performance boost of 2.04 $\times$, 1.54 $\times$, and 1.82 $\times$ across three different benchmarks, respectively. The performance is able to compete with or surpass their larger counterparts such as GPT-4.

Via

Access Paper or Ask Questions

GeoT: Tensor Centric Library for Graph Neural Network via Efficient Segment Reduction on GPU

Apr 08, 2024

Zhongming Yu, Genghan Zhang, Hanxian Huang, Xin Chen, Jishen Zhao

Abstract:In recent years, Graph Neural Networks (GNNs) have ignited a surge of innovation, significantly enhancing the processing of geometric data structures such as graphs, point clouds, and meshes. As the domain continues to evolve, a series of frameworks and libraries are being developed to push GNN efficiency to new heights. While graph-centric libraries have achieved success in the past, the advent of efficient tensor compilers has highlighted the urgent need for tensor-centric libraries. Yet, efficient tensor-centric frameworks for GNNs remain scarce due to unique challenges and limitations encountered when implementing segment reduction in GNN contexts. We introduce GeoT, a cutting-edge tensor-centric library designed specifically for GNNs via efficient segment reduction. GeoT debuts innovative parallel algorithms that not only introduce new design principles but also expand the available design space. Importantly, GeoT is engineered for straightforward fusion within a computation graph, ensuring compatibility with contemporary tensor-centric machine learning frameworks and compilers. Setting a new performance benchmark, GeoT marks a considerable advancement by showcasing an average operator speedup of 1.80x and an end-to-end speedup of 1.68x.

Via

Access Paper or Ask Questions

Multi-modal Learning for WebAssembly Reverse Engineering

Apr 04, 2024

Hanxian Huang, Jishen Zhao

Figure 1 for Multi-modal Learning for WebAssembly Reverse Engineering

Figure 2 for Multi-modal Learning for WebAssembly Reverse Engineering

Figure 3 for Multi-modal Learning for WebAssembly Reverse Engineering

Figure 4 for Multi-modal Learning for WebAssembly Reverse Engineering

Abstract:The increasing adoption of WebAssembly (Wasm) for performance-critical and security-sensitive tasks drives the demand for WebAssembly program comprehension and reverse engineering. Recent studies have introduced machine learning (ML)-based WebAssembly reverse engineering tools. Yet, the generalization of task-specific ML solutions remains challenging, because their effectiveness hinges on the availability of an ample supply of high-quality task-specific labeled data. Moreover, previous works overlook the high-level semantics present in source code and its documentation. Acknowledging the abundance of available source code with documentation, which can be compiled into WebAssembly, we propose to learn representations of them concurrently and harness their mutual relationships for effective WebAssembly reverse engineering. In this paper, we present WasmRev, the first multi-modal pre-trained language model for WebAssembly reverse engineering. WasmRev is pre-trained using self-supervised learning on a large-scale multi-modal corpus encompassing source code, code documentation and the compiled WebAssembly, without requiring labeled data. WasmRev incorporates three tailored multi-modal pre-training tasks to capture various characteristics of WebAssembly and cross-modal relationships. WasmRev is only trained once to produce general-purpose representations that can broadly support WebAssembly reverse engineering tasks through few-shot fine-tuning with much less labeled data, improving data efficiency. We fine-tune WasmRev onto three important reverse engineering tasks: type recovery, function purpose identification and WebAssembly summarization. Our results show that WasmRev pre-trained on the corpus of multi-modal samples establishes a robust foundation for these tasks, achieving high task accuracy and outperforming the state-of-the-art ML methods for WebAssembly reverse engineering.

* Accepted by ISSTA '24

Via

Access Paper or Ask Questions

Learning to Maximize Mutual Information for Chain-of-Thought Distillation

Mar 05, 2024

Xin Chen, Hanxian Huang, Yanjun Gao, Yi Wang, Jishen Zhao, Ke Ding

Figure 1 for Learning to Maximize Mutual Information for Chain-of-Thought Distillation

Figure 2 for Learning to Maximize Mutual Information for Chain-of-Thought Distillation

Figure 3 for Learning to Maximize Mutual Information for Chain-of-Thought Distillation

Figure 4 for Learning to Maximize Mutual Information for Chain-of-Thought Distillation

Abstract:Knowledge distillation, the technique of transferring knowledge from large, complex models to smaller ones, marks a pivotal step towards efficient AI deployment. Distilling Step-by-Step (DSS), a novel method utilizing chain-of-thought (CoT) distillation, has demonstrated promise by imbuing smaller models with the superior reasoning capabilities of their larger counterparts. In DSS, the distilled model acquires the ability to generate rationales and predict labels concurrently through a multi-task learning framework. However, DSS overlooks the intrinsic relationship between the two training tasks, leading to ineffective integration of CoT knowledge with the task of label prediction. To this end, we investigate the mutual relationship of the two tasks from Information Bottleneck perspective and formulate it as maximizing the mutual information of the representation features of the two tasks. We propose a variational approach to solve this optimization problem using a learning-based method. Our experimental results across four datasets demonstrate that our method outperforms the state-of-the-art DSS. Our findings offer insightful guidance for future research on language model distillation as well as applications involving CoT. Code and models will be released soon.

Via

Access Paper or Ask Questions

Sibyl: Forecasting Time-Evolving Query Workloads

Jan 08, 2024

Hanxian Huang, Tarique Siddiqui, Rana Alotaibi, Carlo Curino, Jyoti Leeka, Alekh Jindal, Jishen Zhao, Jesus Camacho-Rodriguez, Yuanyuan Tian

Abstract:Database systems often rely on historical query traces to perform workload-based performance tuning. However, real production workloads are time-evolving, making historical queries ineffective for optimizing future workloads. To address this challenge, we propose SIBYL, an end-to-end machine learning-based framework that accurately forecasts a sequence of future queries, with the entire query statements, in various prediction windows. Drawing insights from real-workloads, we propose template-based featurization techniques and develop a stacked-LSTM with an encoder-decoder architecture for accurate forecasting of query workloads. We also develop techniques to improve forecasting accuracy over large prediction windows and achieve high scalability over large workloads with high variability in arrival rates of queries. Finally, we propose techniques to handle workload drifts. Our evaluation on four real workloads demonstrates that SIBYL can forecast workloads with an $87.3\%$ median F1 score, and can result in $1.7\times$ and $1.3\times$ performance improvement when applied to materialized view selection and index selection applications, respectively.

* The paper has been accepted by SIGMOD 2024

Via

Access Paper or Ask Questions

Interpretable and Flexible Target-Conditioned Neural Planners For Autonomous Vehicles

Sep 23, 2023

Haolan Liu, Jishen Zhao, Liangjun Zhang

Abstract:Learning-based approaches to autonomous vehicle planners have the potential to scale to many complicated real-world driving scenarios by leveraging huge amounts of driver demonstrations. However, prior work only learns to estimate a single planning trajectory, while there may be multiple acceptable plans in real-world scenarios. To solve the problem, we propose an interpretable neural planner to regress a heatmap, which effectively represents multiple potential goals in the bird's-eye view of an autonomous vehicle. The planner employs an adaptive Gaussian kernel and relaxed hourglass loss to better capture the uncertainty of planning problems. We also use a negative Gaussian kernel to add supervision to the heatmap regression, enabling the model to learn collision avoidance effectively. Our systematic evaluation on the Lyft Open Dataset across a diverse range of real-world driving scenarios shows that our model achieves a safer and more flexible driving performance than prior works.

Via

Access Paper or Ask Questions