Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pietro Liguori

PyResBugs: A Dataset of Residual Python Bugs for Natural Language-Driven Fault Injection

May 09, 2025

Domenico Cotroneo, Giuseppe De Rosa, Pietro Liguori

Abstract:This paper presents PyResBugs, a curated dataset of residual bugs, i.e., defects that persist undetected during traditional testing but later surface in production, collected from major Python frameworks. Each bug in the dataset is paired with its corresponding fault-free (fixed) version and annotated with multi-level natural language (NL) descriptions. These NL descriptions enable natural language-driven fault injection, offering a novel approach to simulating real-world faults in software systems. By bridging the gap between software fault injection techniques and real-world representativeness, PyResBugs provides researchers with a high-quality resource for advancing AI-driven automated testing in Python systems.

Via

Access Paper or Ask Questions

CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection

Jan 08, 2025

Ruijun Feng, Hammond Pearce, Pietro Liguori, Yulei Sui

Abstract:Large language models (LLMs) have been proposed as powerful tools for detecting software vulnerabilities, where task-specific fine-tuning is typically employed to provide vulnerability-specific knowledge to the LLMs for this purpose. However, traditional full-parameter fine-tuning is inefficient for modern, complex LLMs, which contain billions of parameters. Soft prompt tuning has been suggested as a more efficient alternative for fine-tuning LLMs in general cases. However, pure soft prompt tuning treats source code as plain text, losing structural information inherent in source code. Meanwhile, graph-enhanced soft prompt tuning methods, which aim to address this issue, are unable to preserve the rich semantic information within code graphs, as they are primarily designed for general graph-related tasks and focus more on adjacency information. They also fail to ensure computational efficiency while accounting for graph-text interactions. This paper, therefore, introduces a new code graph-enhanced, structure-aware soft prompt tuning method for vulnerability detection, referred to as CGP-Tuning. It employs innovative type-aware embeddings to capture the rich semantic information within code graphs, along with a novel and efficient cross-modal alignment module that achieves linear computational cost while incorporating graph-text interactions. The proposed CGP-Tuning is evaluated on the latest DiverseVul dataset and the most recent open-source code LLMs, CodeLlama and CodeGemma. Experimental results demonstrate that CGP-Tuning outperforms the best state-of-the-art method by an average of 3.5 percentage points in accuracy, without compromising its vulnerability detection capabilities for long source code.

* 14 pages, 5 figures

Via

Access Paper or Ask Questions

Enhancing AI-based Generation of Software Exploits with Contextual Information

Aug 06, 2024

Pietro Liguori, Cristina Improta, Roberto Natella, Bojan Cukic, Domenico Cotroneo

Abstract:This practical experience report explores Neural Machine Translation (NMT) models' capability to generate offensive security code from natural language (NL) descriptions, highlighting the significance of contextual understanding and its impact on model performance. Our study employs a dataset comprising real shellcodes to evaluate the models across various scenarios, including missing information, necessary context, and unnecessary context. The experiments are designed to assess the models' resilience against incomplete descriptions, their proficiency in leveraging context for enhanced accuracy, and their ability to discern irrelevant information. The findings reveal that the introduction of contextual data significantly improves performance. However, the benefits of additional context diminish beyond a certain point, indicating an optimal level of contextual information for model training. Moreover, the models demonstrate an ability to filter out unnecessary context, maintaining high levels of accuracy in the generation of offensive security code. This study paves the way for future research on optimizing context use in AI-driven code generation, particularly for applications requiring a high degree of technical precision such as the generation of offensive code.

* Accepted for publication at The 35th IEEE International Symposium on Software Reliability Engineering

Via

Access Paper or Ask Questions

AI Code Generators for Security: Friend or Foe?

Feb 02, 2024

Roberto Natella, Pietro Liguori, Cristina Improta, Bojan Cukic, Domenico Cotroneo

Figure 1 for AI Code Generators for Security: Friend or Foe?

Figure 2 for AI Code Generators for Security: Friend or Foe?

Figure 3 for AI Code Generators for Security: Friend or Foe?

Figure 4 for AI Code Generators for Security: Friend or Foe?

Abstract:Recent advances of artificial intelligence (AI) code generators are opening new opportunities in software security research, including misuse by malicious actors. We review use cases for AI code generators for security and introduce an evaluation benchmark.

* IEEE Security & Privacy, Early Access, February 2024
* Dataset available at: https://github.com/dessertlab/violent-python

Via

Access Paper or Ask Questions

Automating the Correctness Assessment of AI-generated Code for Security Contexts

Oct 28, 2023

Domenico Cotroneo, Alessio Foggia, Cristina Improta, Pietro Liguori, Roberto Natella

Abstract:In this paper, we propose a fully automated method, named ACCA, to evaluate the correctness of AI-generated code for security purposes. The method uses symbolic execution to assess whether the AI-generated code behaves as a reference implementation. We use ACCA to assess four state-of-the-art models trained to generate security-oriented assembly code and compare the results of the evaluation with different baseline solutions, including output similarity metrics, widely used in the field, and the well-known ChatGPT, the AI-powered language model developed by OpenAI. Our experiments show that our method outperforms the baseline solutions and assesses the correctness of the AI-generated code similar to the human-based evaluation, which is considered the ground truth for the assessment in the field. Moreover, ACCA has a very strong correlation with human evaluation (Pearson's correlation coefficient r=0.84 on average). Finally, since it is a fully automated solution that does not require any human intervention, the proposed method performs the assessment of every code snippet in ~0.17s on average, which is definitely lower than the average time required by human analysts to manually inspect the code, based on our experience.

Via

Access Paper or Ask Questions

Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning Attacks

Aug 04, 2023

Domenico Cotroneo, Cristina Improta, Pietro Liguori, Roberto Natella

Figure 1 for Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning Attacks

Figure 2 for Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning Attacks

Figure 3 for Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning Attacks

Figure 4 for Vulnerabilities in AI Code Generators: Exploring Targeted Data Poisoning Attacks

Abstract:In this work, we assess the security of AI code generators via data poisoning, i.e., an attack that injects malicious samples into the training data to generate vulnerable code. We poison the training data by injecting increasing amounts of code containing security vulnerabilities and assess the attack's success on different state-of-the-art models for code generation. Our analysis shows that AI code generators are vulnerable to even a small amount of data poisoning. Moreover, the attack does not impact the correctness of code generated by pre-trained models, making it hard to detect.

Via

Access Paper or Ask Questions

Enhancing Robustness of AI Offensive Code Generators via Data Augmentation

Jun 08, 2023

Cristina Improta, Pietro Liguori, Roberto Natella, Bojan Cukic, Domenico Cotroneo

Abstract:In this work, we present a method to add perturbations to the code descriptions, i.e., new inputs in natural language (NL) from well-intentioned developers, in the context of security-oriented code, and analyze how and to what extent perturbations affect the performance of AI offensive code generators. Our experiments show that the performance of the code generators is highly affected by perturbations in the NL descriptions. To enhance the robustness of the code generators, we use the method to perform data augmentation, i.e., to increase the variability and diversity of the training data, proving its effectiveness against both perturbed and non-perturbed code descriptions.

Via

Access Paper or Ask Questions

Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code Generators

Dec 12, 2022

Cristina Improta, Pietro Liguori, Roberto Natella, Bojan Cukic, Domenico Cotroneo

Abstract:AI-based code generators are an emerging solution for automatically writing programs starting from descriptions in natural language, by using deep neural networks (Neural Machine Translation, NMT). In particular, code generators have been used for ethical hacking and offensive security testing by generating proof-of-concept attacks. Unfortunately, the evaluation of code generators still faces several issues. The current practice uses automatic metrics, which compute the textual similarity of generated code with ground-truth references. However, it is not clear what metric to use, and which metric is most suitable for specific contexts. This practical experience report analyzes a large set of output similarity metrics on offensive code generators. We apply the metrics on two state-of-the-art NMT models using two datasets containing offensive assembly and Python code with their descriptions in the English language. We compare the estimates from the automatic metrics with human evaluation and provide practical insights into their strengths and limitations.

Via

Access Paper or Ask Questions

Can NMT Understand Me? Towards Perturbation-based Evaluation of NMT Models for Code Generation

Mar 30, 2022

Pietro Liguori, Cristina Improta, Simona De Vivo, Roberto Natella, Bojan Cukic, Domenico Cotroneo

Figure 1 for Can NMT Understand Me? Towards Perturbation-based Evaluation of NMT Models for Code Generation

Figure 2 for Can NMT Understand Me? Towards Perturbation-based Evaluation of NMT Models for Code Generation

Figure 3 for Can NMT Understand Me? Towards Perturbation-based Evaluation of NMT Models for Code Generation

Abstract:Neural Machine Translation (NMT) has reached a level of maturity to be recognized as the premier method for the translation between different languages and aroused interest in different research areas, including software engineering. A key step to validate the robustness of the NMT models consists in evaluating the performance of the models on adversarial inputs, i.e., inputs obtained from the original ones by adding small amounts of perturbation. However, when dealing with the specific task of the code generation (i.e., the generation of code starting from a description in natural language), it has not yet been defined an approach to validate the robustness of the NMT models. In this work, we address the problem by identifying a set of perturbations and metrics tailored for the robustness assessment of such models. We present a preliminary experimental evaluation, showing what type of perturbations affect the model the most and deriving useful insights for future directions.

* Paper accepted for publication in the proceedings of The 1st Intl. Workshop on Natural Language-based Software Engineering (NLBSE) to be held with ICSE 2022

Via

Access Paper or Ask Questions

Can We Generate Shellcodes via Natural Language? An Empirical Study

Feb 08, 2022

Pietro Liguori, Erfan Al-Hossami, Domenico Cotroneo, Roberto Natella, Bojan Cukic, Samira Shaikh

Figure 1 for Can We Generate Shellcodes via Natural Language? An Empirical Study

Figure 2 for Can We Generate Shellcodes via Natural Language? An Empirical Study

Figure 3 for Can We Generate Shellcodes via Natural Language? An Empirical Study

Figure 4 for Can We Generate Shellcodes via Natural Language? An Empirical Study

Abstract:Writing software exploits is an important practice for offensive security analysts to investigate and prevent attacks. In particular, shellcodes are especially time-consuming and a technical challenge, as they are written in assembly language. In this work, we address the task of automatically generating shellcodes, starting purely from descriptions in natural language, by proposing an approach based on Neural Machine Translation (NMT). We then present an empirical study using a novel dataset (Shellcode_IA32), which consists of 3,200 assembly code snippets of real Linux/x86 shellcodes from public databases, annotated using natural language. Moreover, we propose novel metrics to evaluate the accuracy of NMT at generating shellcodes. The empirical analysis shows that NMT can generate assembly code snippets from the natural language with high accuracy and that in many cases can generate entire shellcodes with no errors.

* 33 pages, 5 figures, 9 tables. To be published in Automated Software Engineering journal

Via

Access Paper or Ask Questions