Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiuwei Shang

AutoPT: How Far Are We from the End2End Automated Web Penetration Testing?

Nov 02, 2024

Benlong Wu, Guoqiang Chen, Kejiang Chen, Xiuwei Shang, Jiapeng Han, Yanru He, Weiming Zhang, Nenghai Yu

Figure 1 for AutoPT: How Far Are We from the End2End Automated Web Penetration Testing?

Figure 2 for AutoPT: How Far Are We from the End2End Automated Web Penetration Testing?

Figure 3 for AutoPT: How Far Are We from the End2End Automated Web Penetration Testing?

Figure 4 for AutoPT: How Far Are We from the End2End Automated Web Penetration Testing?

Abstract:Penetration testing is essential to ensure Web security, which can detect and fix vulnerabilities in advance, and prevent data leakage and serious consequences. The powerful inference capabilities of large language models (LLMs) have made significant progress in various fields, and the development potential of LLM-based agents can revolutionize the cybersecurity penetration testing industry. In this work, we establish a comprehensive end-to-end penetration testing benchmark using a real-world penetration testing environment to explore the capabilities of LLM-based agents in this domain. Our results reveal that the agents are familiar with the framework of penetration testing tasks, but they still face limitations in generating accurate commands and executing complete processes. Accordingly, we summarize the current challenges, including the difficulty of maintaining the entire message history and the tendency for the agent to become stuck. Based on the above insights, we propose a Penetration testing State Machine (PSM) that utilizes the Finite State Machine (FSM) methodology to address these limitations. Then, we introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs, which utilizes the inherent inference ability of LLM and the constraint framework of state machines. Our evaluation results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model and improves the task completion rate from 22% to 41% on the benchmark target. Compared with the baseline framework and manual work, AutoPT also reduces time and economic costs further. Hence, our AutoPT has facilitated the development of automated penetration testing and significantly impacted both academia and industry.

* 22 pages, 6 figures

Via

Access Paper or Ask Questions

RealVul: Can We Detect Vulnerabilities in Web Applications with LLM?

Oct 10, 2024

Di Cao, Yong Liao, Xiuwei Shang

Figure 1 for RealVul: Can We Detect Vulnerabilities in Web Applications with LLM?

Figure 2 for RealVul: Can We Detect Vulnerabilities in Web Applications with LLM?

Figure 3 for RealVul: Can We Detect Vulnerabilities in Web Applications with LLM?

Figure 4 for RealVul: Can We Detect Vulnerabilities in Web Applications with LLM?

Abstract:The latest advancements in large language models (LLMs) have sparked interest in their potential for software vulnerability detection. However, there is currently a lack of research specifically focused on vulnerabilities in the PHP language, and challenges in extracting samples and processing persist, hindering the model's ability to effectively capture the characteristics of specific vulnerabilities. In this paper, we present RealVul, the first LLM-based framework designed for PHP vulnerability detection, addressing these issues. By vulnerability candidate detection methods and employing techniques such as normalization, we can isolate potential vulnerability triggers while streamlining the code and eliminating unnecessary semantic information, enabling the model to better understand and learn from the generated vulnerability samples. We also address the issue of insufficient PHP vulnerability samples by improving data synthesis methods. To evaluate RealVul's performance, we conduct an extensive analysis using five distinct code LLMs on vulnerability data from 180 PHP projects. The results demonstrate a significant improvement in both effectiveness and generalization compared to existing methods, effectively boosting the vulnerability detection capabilities of these models.

Via

Access Paper or Ask Questions