Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhe Hou

Adaptive Plan-Execute Framework for Smart Contract Security Auditing

May 21, 2025

Zhiyuan Wei, Jing Sun, Zijian Zhang, Zhe Hou, Zixiao Zhao

Abstract:Large Language Models (LLMs) have shown great promise in code analysis and auditing; however, they still struggle with hallucinations and limited context-aware reasoning. We introduce SmartAuditFlow, a novel Plan-Execute framework that enhances smart contract security analysis through dynamic audit planning and structured execution. Unlike conventional LLM-based auditing approaches that follow fixed workflows and predefined steps, SmartAuditFlow dynamically generates and refines audit plans based on the unique characteristics of each smart contract. It continuously adjusts its auditing strategy in response to intermediate LLM outputs and newly detected vulnerabilities, ensuring a more adaptive and precise security assessment. The framework then executes these plans step by step, applying a structured reasoning process to enhance vulnerability detection accuracy while minimizing hallucinations and false positives. To further improve audit precision, SmartAuditFlow integrates iterative prompt optimization and external knowledge sources, such as static analysis tools and Retrieval-Augmented Generation (RAG). This ensures audit decisions are contextually informed and backed by real-world security knowledge, producing comprehensive security reports. Extensive evaluations across multiple benchmarks demonstrate that SmartAuditFlow outperforms existing methods, achieving 100 percent accuracy on common and critical vulnerabilities, 41.2 percent accuracy for comprehensive coverage of known smart contract weaknesses in real-world projects, and successfully identifying all 13 tested CVEs. These results highlight SmartAuditFlow's scalability, cost-effectiveness, and superior adaptability over traditional static analysis tools and contemporary LLM-based approaches, establishing it as a robust solution for automated smart contract auditing.

* 30 pages, 5 figures

Via

Access Paper or Ask Questions

Clustering and analysis of user behaviour in blockchain: A case study of Planet IX

Apr 16, 2025

Dorottya Zelenyanszki, Zhe Hou, Kamanashis Biswas, Vallipuram Muthukkumarasamy

Abstract:Decentralised applications (dApps) that run on public blockchains have the benefit of trustworthiness and transparency as every activity that happens on the blockchain can be publicly traced through the transaction data. However, this introduces a potential privacy problem as this data can be tracked and analysed, which can reveal user-behaviour information. A user behaviour analysis pipeline was proposed to present how this type of information can be extracted and analysed to identify separate behavioural clusters that can describe how users behave in the game. The pipeline starts with the collection of transaction data, involving smart contracts, that is collected from a blockchain-based game called Planet IX. Both the raw transaction information and the transaction events are considered in the data collection. From this data, separate game actions can be formed and those are leveraged to present how and when the users conducted their in-game activities in the form of user flows. An extended version of these user flows also presents how the Non-Fungible Tokens (NFTs) are being leveraged in the user actions. The latter is given as input for a Graph Neural Network (GNN) model to provide graph embeddings for these flows which then can be leveraged by clustering algorithms to cluster user behaviours into separate behavioural clusters. We benchmark and compare well-known clustering algorithms as a part of the proposed method. The user behaviour clusters were analysed and visualised in a graph format. It was found that behavioural information can be extracted regarding the users that belong to these clusters. Such information can be exploited by malicious users to their advantage. To demonstrate this, a privacy threat model was also presented based on the results that correspond to multiple potentially affected areas.

* 15 pages, 8 figures, submitted to Blockchain: Research and Applications

Via

Access Paper or Ask Questions

F$^3$Set: Towards Analyzing Fast, Frequent, and Fine-grained Events from Videos

Apr 15, 2025

Zhaoyu Liu, Kan Jiang, Murong Ma, Zhe Hou, Yun Lin, Jin Song Dong

Abstract:Analyzing Fast, Frequent, and Fine-grained (F$^3$) events presents a significant challenge in video analytics and multi-modal LLMs. Current methods struggle to identify events that satisfy all the F$^3$ criteria with high accuracy due to challenges such as motion blur and subtle visual discrepancies. To advance research in video understanding, we introduce F$^3$Set, a benchmark that consists of video datasets for precise F$^3$ event detection. Datasets in F$^3$Set are characterized by their extensive scale and comprehensive detail, usually encompassing over 1,000 event types with precise timestamps and supporting multi-level granularity. Currently, F$^3$Set contains several sports datasets, and this framework may be extended to other applications as well. We evaluated popular temporal action understanding methods on F$^3$Set, revealing substantial challenges for existing techniques. Additionally, we propose a new method, F$^3$ED, for F$^3$ event detections, achieving superior performance. The dataset, model, and benchmark code are available at https://github.com/F3Set/F3Set.

* ICLR 2025; Website URL: https://lzyandy.github.io/f3set-website/

Via

Access Paper or Ask Questions

The Fusion of Large Language Models and Formal Methods for Trustworthy AI Agents: A Roadmap

Dec 09, 2024

Yedi Zhang, Yufan Cai, Xinyue Zuo, Xiaokun Luan, Kailong Wang, Zhe Hou, Yifan Zhang, Zhiyuan Wei, Meng Sun, Jun Sun(+2 more)

Figure 1 for The Fusion of Large Language Models and Formal Methods for Trustworthy AI Agents: A Roadmap

Figure 2 for The Fusion of Large Language Models and Formal Methods for Trustworthy AI Agents: A Roadmap

Figure 3 for The Fusion of Large Language Models and Formal Methods for Trustworthy AI Agents: A Roadmap

Abstract:Large Language Models (LLMs) have emerged as a transformative AI paradigm, profoundly influencing daily life through their exceptional language understanding and contextual generation capabilities. Despite their remarkable performance, LLMs face a critical challenge: the propensity to produce unreliable outputs due to the inherent limitations of their learning-based nature. Formal methods (FMs), on the other hand, are a well-established computation paradigm that provides mathematically rigorous techniques for modeling, specifying, and verifying the correctness of systems. FMs have been extensively applied in mission-critical software engineering, embedded systems, and cybersecurity. However, the primary challenge impeding the deployment of FMs in real-world settings lies in their steep learning curves, the absence of user-friendly interfaces, and issues with efficiency and adaptability. This position paper outlines a roadmap for advancing the next generation of trustworthy AI systems by leveraging the mutual enhancement of LLMs and FMs. First, we illustrate how FMs, including reasoning and certification techniques, can help LLMs generate more reliable and formally certified outputs. Subsequently, we highlight how the advanced learning capabilities and adaptability of LLMs can significantly enhance the usability, efficiency, and scalability of existing FM tools. Finally, we show that unifying these two computation paradigms -- integrating the flexibility and intelligence of LLMs with the rigorous reasoning abilities of FMs -- has transformative potential for the development of trustworthy AI software systems. We acknowledge that this integration has the potential to enhance both the trustworthiness and efficiency of software engineering practices while fostering the development of intelligent FM tools capable of addressing complex yet real-world challenges.

* 24 pages, 4 figures

Via

Access Paper or Ask Questions

VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs

Oct 25, 2024

Zixiao Zhao, Jing Sun, Zhiyuan Wei, Cheng-Hao Cai, Zhe Hou, Jin Song Dong

Figure 1 for VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs

Figure 2 for VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs

Figure 3 for VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs

Figure 4 for VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs

Abstract:In the field of automated programming, large language models (LLMs) have demonstrated foundational generative capabilities when given detailed task descriptions. However, their current functionalities are primarily limited to function-level development, restricting their effectiveness in complex project environments and specific application scenarios, such as complicated image-processing tasks. This paper presents a multi-agent framework that utilises a hybrid set of LLMs, including GPT-4o and locally deployed open-source models, which collaboratively complete auto-programming tasks. Each agent plays a distinct role in the software development cycle, collectively forming a virtual organisation that works together to produce software products. By establishing a tree-structured thought distribution and development mechanism across project, module, and function levels, this framework offers a cost-effective and efficient solution for code generation. We evaluated our approach using benchmark datasets, and the experimental results demonstrate that VisionCoder significantly outperforms existing methods in image processing auto-programming tasks.

Via

Access Paper or Ask Questions

Towards Large Language Model Aided Program Refinement

Jun 26, 2024

Yufan Cai, Zhe Hou, Xiaokun Luan, David Miguel Sanan Baena, Yun Lin, Jun Sun, Jin Song Dong

Abstract:Program refinement involves correctness-preserving transformations from formal high-level specification statements into executable programs. Traditional verification tool support for program refinement is highly interactive and lacks automation. On the other hand, the emergence of large language models (LLMs) enables automatic code generations from informal natural language specifications. However, code generated by LLMs is often unreliable. Moreover, the opaque procedure from specification to code provided by LLM is an uncontrolled black box. We propose LLM4PR, a tool that combines formal program refinement techniques with informal LLM-based methods to (1) transform the specification to preconditions and postconditions, (2) automatically build prompts based on refinement calculus, (3) interact with LLM to generate code, and finally, (4) verify that the generated code satisfies the conditions of refinement calculus, thus guaranteeing the correctness of the code. We have implemented our tool using GPT4, Coq, and Coqhammer, and evaluated it on the HumanEval and EvalPlus datasets.

Via

Access Paper or Ask Questions

Adversarial Robustness of Deep Neural Networks: A Survey from a Formal Verification Perspective

Jun 24, 2022

Mark Huasong Meng, Guangdong Bai, Sin Gee Teo, Zhe Hou, Yan Xiao, Yun Lin, Jin Song Dong

Figure 1 for Adversarial Robustness of Deep Neural Networks: A Survey from a Formal Verification Perspective

Figure 2 for Adversarial Robustness of Deep Neural Networks: A Survey from a Formal Verification Perspective

Figure 3 for Adversarial Robustness of Deep Neural Networks: A Survey from a Formal Verification Perspective

Figure 4 for Adversarial Robustness of Deep Neural Networks: A Survey from a Formal Verification Perspective

Abstract:Neural networks have been widely applied in security applications such as spam and phishing detection, intrusion prevention, and malware detection. This black-box method, however, often has uncertainty and poor explainability in applications. Furthermore, neural networks themselves are often vulnerable to adversarial attacks. For those reasons, there is a high demand for trustworthy and rigorous methods to verify the robustness of neural network models. Adversarial robustness, which concerns the reliability of a neural network when dealing with maliciously manipulated inputs, is one of the hottest topics in security and machine learning. In this work, we survey existing literature in adversarial robustness verification for neural networks and collect 39 diversified research works across machine learning, security, and software engineering domains. We systematically analyze their approaches, including how robustness is formulated, what verification techniques are used, and the strengths and limitations of each technique. We provide a taxonomy from a formal verification perspective for a comprehensive understanding of this topic. We classify the existing techniques based on property specification, problem reduction, and reasoning strategies. We also demonstrate representative techniques that have been applied in existing studies with a sample model. Finally, we discuss open questions for future research.

Via

Access Paper or Ask Questions

MUC-driven Feature Importance Measurement and Adversarial Analysis for Random Forest

Feb 25, 2022

Shucen Ma, Jianqi Shi, Yanhong Huang, Shengchao Qin, Zhe Hou

Figure 1 for MUC-driven Feature Importance Measurement and Adversarial Analysis for Random Forest

Figure 2 for MUC-driven Feature Importance Measurement and Adversarial Analysis for Random Forest

Figure 3 for MUC-driven Feature Importance Measurement and Adversarial Analysis for Random Forest

Figure 4 for MUC-driven Feature Importance Measurement and Adversarial Analysis for Random Forest

Abstract:The broad adoption of Machine Learning (ML) in security-critical fields demands the explainability of the approach. However, the research on understanding ML models, such as Random Forest (RF), is still in its infant stage. In this work, we leverage formal methods and logical reasoning to develop a novel model-specific method for explaining the prediction of RF. Our approach is centered around Minimal Unsatisfiable Cores (MUC) and provides a comprehensive solution for feature importance, covering local and global aspects, and adversarial sample analysis. Experimental results on several datasets illustrate the high quality of our feature importance measurement. We also demonstrate that our adversarial analysis outperforms the state-of-the-art method. Moreover, our method can produce a user-centered report, which helps provide recommendations in real-life applications.

Via

Access Paper or Ask Questions

An Executable Formal Model of the VHDL in Isabelle/HOL

Feb 08, 2022

Wilayat Khan, Zhe Hou, David Sanan, Jamel Nebhen, Yang Liu, Alwen Tiu

Abstract:In the hardware design process, hardware components are usually described in a hardware description language. Most of the hardware description languages, such as Verilog and VHDL, do not have mathematical foundation and hence are not fit for formal reasoning about the design. To enable formal reasoning in one of the most commonly used description language VHDL, we define a formal model of the VHDL language in Isabelle/HOL. Our model targets the functional part of VHDL designs used in industry, specifically the design of the LEON3 processor's integer unit. We cover a wide range of features in the VHDL language that are usually not modelled in the literature and define a novel operational semantics for it. Furthermore, our model can be exported to OCaml code for execution, turning the formal model into a VHDL simulator. We have tested our simulator against simple designs used in the literature, as well as the div32 module in the LEON3 design. The Isabelle/HOL code is publicly available: https://zhehou.github.io/apps/VHDLModel.zip

Via

Access Paper or Ask Questions

Silas: High Performance, Explainable and Verifiable Machine Learning

Oct 03, 2019

Hadrien Bride, Zhe Hou, Jie Dong, Jin Song Dong, Ali Mirjalili

Figure 1 for Silas: High Performance, Explainable and Verifiable Machine Learning

Figure 2 for Silas: High Performance, Explainable and Verifiable Machine Learning

Figure 3 for Silas: High Performance, Explainable and Verifiable Machine Learning

Figure 4 for Silas: High Performance, Explainable and Verifiable Machine Learning

Abstract:This paper introduces a new classification tool named Silas, which is built to provide a more transparent and dependable data analytics service. A focus of Silas is on providing a formal foundation of decision trees in order to support logical analysis and verification of learned prediction models. This paper describes the distinct features of Silas: The Model Audit module formally verifies the prediction model against user specifications, the Enforcement Learning module trains prediction models that are guaranteed correct, the Model Insight and Prediction Insight modules reason about the prediction model and explain the decision-making of predictions. We also discuss implementation details ranging from programming paradigm to memory management that help achieve high-performance computation.

Via

Access Paper or Ask Questions