Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models

Jun 11, 2024

Yu Liu, Mingxin Yang, Yu Xie, Ping Chen, Xiaojin Zhang, Wei Chen

Figure 1 for VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models

Figure 2 for VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models

Figure 3 for VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models

Figure 4 for VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models

Share this with someone who'll enjoy it:

Abstract:Large Language Models (LLMs) have training corpora containing large amounts of program code, greatly improving the model's code comprehension and generation capabilities. However, sound comprehensive research on detecting program vulnerabilities, a more specific task related to code, and evaluating the performance of LLMs in this more specialized scenario is still lacking. To address common challenges in vulnerability analysis, our study introduces a new benchmark, VulDetectBench, specifically designed to assess the vulnerability detection capabilities of LLMs. The benchmark comprehensively evaluates LLM's ability to identify, classify, and locate vulnerabilities through five tasks of increasing difficulty. We evaluate the performance of 17 models (both open- and closed-source) and find that while existing models can achieve over 80% accuracy on tasks related to vulnerability identification and classification, they still fall short on specific, more detailed vulnerability analysis tasks, with less than 30% accuracy, making it difficult to provide valuable auxiliary information for professional vulnerability mining. Our benchmark effectively evaluates the capabilities of various LLMs at different levels in the specific task of vulnerability detection, providing a foundation for future research and improvements in this critical area of code security. VulDetectBench is publicly available at https://github.com/Sweetaroo/VulDetectBench.

View paper on

Share this with someone who'll enjoy it:

Title:VulDetectBench: Evaluating the Deep Capability of Vulnerability Detection with Large Language Models

Paper and Code