Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hazim Hanif

VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection

May 25, 2022

Hazim Hanif, Sergio Maffeis

Figure 1 for VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection

Figure 2 for VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection

Figure 3 for VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection

Figure 4 for VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection

Abstract:This paper presents VulBERTa, a deep learning approach to detect security vulnerabilities in source code. Our approach pre-trains a RoBERTa model with a custom tokenisation pipeline on real-world code from open-source C/C++ projects. The model learns a deep knowledge representation of the code syntax and semantics, which we leverage to train vulnerability detection classifiers. We evaluate our approach on binary and multi-class vulnerability detection tasks across several datasets (Vuldeepecker, Draper, REVEAL and muVuldeepecker) and benchmarks (CodeXGLUE and D2A). The evaluation results show that VulBERTa achieves state-of-the-art performance and outperforms existing approaches across different datasets, despite its conceptual simplicity, and limited cost in terms of size of training data and number of model parameters.

* Accepted as a conference paper at IJCNN 2022

Via

Access Paper or Ask Questions

A Hybrid Graph Neural Network Approach for Detecting PHP Vulnerabilities

Dec 16, 2020

Rishi Rabheru, Hazim Hanif, Sergio Maffeis

Abstract:This paper presents DeepTective, a deep learning approach to detect vulnerabilities in PHP source code. Our approach implements a novel hybrid technique that combines Gated Recurrent Units and Graph Convolutional Networks to detect SQLi, XSS and OSCI vulnerabilities leveraging both syntactic and semantic information. We evaluate DeepTective and compare it to the state of the art on an established synthetic dataset and on a novel real-world dataset collected from GitHub. Experimental results show that DeepTective achieves near perfect classification on the synthetic dataset, and an F1 score of 88.12% on the realistic dataset, outperforming related approaches. We validate DeepTective in the wild by discovering 4 novel vulnerabilities in established WordPress plugins.

* A poster version of this paper appeared as https://doi.org/10.1145/3412841.3442132

Via

Access Paper or Ask Questions