Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Martin Monperrus

INRIA Lille - Nord Europe

RepairBench: Leaderboard of Frontier Models for Program Repair

Sep 27, 2024

André Silva, Martin Monperrus

Figure 1 for RepairBench: Leaderboard of Frontier Models for Program Repair

Figure 2 for RepairBench: Leaderboard of Frontier Models for Program Repair

Figure 3 for RepairBench: Leaderboard of Frontier Models for Program Repair

Abstract:AI-driven program repair uses AI models to repair buggy software by producing patches. Rapid advancements in AI surely impact state-of-the-art performance of program repair. Yet, grasping this progress requires frequent and standardized evaluations. We propose RepairBench, a novel leaderboard for AI-driven program repair. The key characteristics of RepairBench are: 1) it is execution-based: all patches are compiled and executed against a test suite, 2) it assesses frontier models in a frequent and standardized way. RepairBench leverages two high-quality benchmarks, Defects4J and GitBug-Java, to evaluate frontier models against real-world program repair tasks. We publicly release the evaluation framework of RepairBench. We will update the leaderboard as new frontier models are released.

Via

Access Paper or Ask Questions

DISL: Fueling Research with A Large Dataset of Solidity Smart Contracts

Mar 26, 2024

Gabriele Morello, Mojtaba Eshghie, Sofia Bobadilla, Martin Monperrus

Abstract:The DISL dataset features a collection of $514,506$ unique Solidity files that have been deployed to Ethereum mainnet. It caters to the need for a large and diverse dataset of real-world smart contracts. DISL serves as a resource for developing machine learning systems and for benchmarking software engineering tools designed for smart contracts. By aggregating every verified smart contract from Etherscan up to January 15, 2024, DISL surpasses existing datasets in size and recency.

Via

Access Paper or Ask Questions

Generative AI to Generate Test Data Generators

Jan 31, 2024

Benoit Baudry, Khashayar Etemadi, Sen Fang, Yogya Gamage, Yi Liu, Yuxin Liu, Martin Monperrus, Javier Ron, André Silva, Deepika Tiwari

Abstract:Generating fake data is an essential dimension of modern software testing, as demonstrated by the number and significance of data faking libraries. Yet, developers of faking libraries cannot keep up with the wide range of data to be generated for different natural languages and domains. In this paper, we assess the ability of generative AI for generating test data in different domains. We design three types of prompts for Large Language Models (LLMs), which perform test data generation tasks at different levels of integrability: 1) raw test data generation, 2) synthesizing programs in a specific language that generate useful test data, and 3) producing programs that use state-of-the-art faker libraries. We evaluate our approach by prompting LLMs to generate test data for 11 domains. The results show that LLMs can successfully generate realistic test data generators in a wide range of domains at all three levels of integrability.

Via

Access Paper or Ask Questions

RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair

Dec 25, 2023

André Silva, Sen Fang, Martin Monperrus

Abstract:Automated Program Repair (APR) has evolved significantly with the advent of Large Language Models (LLMs). Fine-tuning LLMs for program repair is a recent avenue of research, with many dimensions which have not been explored. Existing work mostly fine-tunes LLMs with naive code representations and is fundamentally limited in its ability to fine-tune larger LLMs. To address this problem, we propose RepairLLaMA, a novel program repair approach that combines 1) code representations for APR and 2) the state-of-the-art parameter-efficient LLM fine-tuning technique called LoRA. This results in RepairLLaMA producing a highly effective `program repair adapter' for fixing bugs with language models. Our experiments demonstrate the validity of both concepts. First, fine-tuning adapters with program repair specific code representations enables the model to use meaningful repair signals. Second, parameter-efficient fine-tuning helps fine-tuning to converge and contributes to the effectiveness of the repair adapter to fix data-points outside the fine-tuning data distribution. Overall, RepairLLaMA correctly fixes 125 Defects4J v2 and 82 HumanEval-Java bugs, outperforming all baselines.

Via

Access Paper or Ask Questions

Supersonic: Learning to Generate Source Code Optimizations in C/C++

Oct 02, 2023

Zimin Chen, Sen Fang, Martin Monperrus

Abstract:Software optimization refines programs for resource efficiency while preserving functionality. Traditionally, it is a process done by developers and compilers. This paper introduces a third option, automated optimization at the source code level. We present Supersonic, a neural approach targeting minor source code modifications for optimization. Using a seq2seq model, Supersonic is trained on C/C++ program pairs ($x_{t}$, $x_{t+1}$), where $x_{t+1}$ is an optimized version of $x_{t}$, and outputs a diff. Supersonic's performance is benchmarked against OpenAI's GPT-3.5-Turbo and GPT-4 on competitive programming tasks. The experiments show that Supersonic not only outperforms both models on the code optimization task but also minimizes the extent of the change with a model more than 600x smaller than GPT-3.5-Turbo and 3700x smaller than GPT-4.

Via

Access Paper or Ask Questions

Self-Supervised Learning to Prove Equivalence Between Programs via Semantics-Preserving Rewrite Rules

Sep 22, 2021

Steve Kommrusch, Martin Monperrus, Louis-Noël Pouchet

Figure 1 for Self-Supervised Learning to Prove Equivalence Between Programs via Semantics-Preserving Rewrite Rules

Figure 2 for Self-Supervised Learning to Prove Equivalence Between Programs via Semantics-Preserving Rewrite Rules

Figure 3 for Self-Supervised Learning to Prove Equivalence Between Programs via Semantics-Preserving Rewrite Rules

Figure 4 for Self-Supervised Learning to Prove Equivalence Between Programs via Semantics-Preserving Rewrite Rules

Abstract:We target the problem of synthesizing proofs of semantic equivalence between two programs made of sequences of statements with complex symbolic expressions. We propose a neural network architecture based on the transformer to generate axiomatic proofs of equivalence between program pairs. We generate expressions which include scalars and vectors and support multi-typed rewrite rules to prove equivalence. For training the system, we develop an original training technique, which we call self-supervised sample selection. This incremental training improves the quality, generalizability and extensibility of the learned model. We study the effectiveness of the system to generate proofs of increasing length, and we demonstrate how transformer models learn to represent complex and verifiable symbolic reasoning. Our system, S4Eq, achieves 97% proof success on 10,000 pairs of programs while ensuring zero false positives by design.

* 18 pages

Via

Access Paper or Ask Questions

Multimodal Representation for Neural Code Search

Jul 23, 2021

Jian Gu, Zimin Chen, Martin Monperrus

Figure 1 for Multimodal Representation for Neural Code Search

Figure 2 for Multimodal Representation for Neural Code Search

Figure 3 for Multimodal Representation for Neural Code Search

Figure 4 for Multimodal Representation for Neural Code Search

Abstract:Semantic code search is about finding semantically relevant code snippets for a given natural language query. In the state-of-the-art approaches, the semantic similarity between code and query is quantified as the distance of their representation in the shared vector space. In this paper, to improve the vector space, we introduce tree-serialization methods on a simplified form of AST and build the multimodal representation for the code data. We conduct extensive experiments using a single corpus that is large-scale and multi-language: CodeSearchNet. Our results show that both our tree-serialized representations and multimodal learning model improve the performance of code search. Last, we define intuitive quantification metrics oriented to the completeness of semantic and syntactic information of the code data, to help understand the experimental findings.

* 12 pages, 9 figures, accepted by ICSME 2021, the camera-ready version

Via

Access Paper or Ask Questions

Neural Transfer Learning for Repairing Security Vulnerabilities in C Code

Apr 16, 2021

Zimin Chen, Steve Kommrusch, Martin Monperrus

Figure 1 for Neural Transfer Learning for Repairing Security Vulnerabilities in C Code

Figure 2 for Neural Transfer Learning for Repairing Security Vulnerabilities in C Code

Figure 3 for Neural Transfer Learning for Repairing Security Vulnerabilities in C Code

Figure 4 for Neural Transfer Learning for Repairing Security Vulnerabilities in C Code

Abstract:In this paper, we address the problem of automatic repair of software vulnerabilities with deep learning. The major problem with data-driven vulnerability repair is that the few existing datasets of known confirmed vulnerabilities consist of only a few thousand examples. However, training a deep learning model often requires hundreds of thousands of examples. In this work, we leverage the intuition that the bug fixing task and the vulnerability fixing task are related, and the knowledge learned from bug fixes can be transferred to fixing vulnerabilities. In the machine learning community, this technique is called transfer learning. In this paper, we propose an approach for repairing security vulnerabilities named VRepair which is based on transfer learning. VRepair is first trained on a large bug fix corpus, and is then tuned on a vulnerability fix dataset, which is an order of magnitudes smaller. In our experiments, we show that a model trained only on a bug fix corpus can already fix some vulnerabilities. Then, we demonstrate that transfer learning improves the ability to repair vulnerable C functions. In the end, we present evidence that transfer learning produces more stable and superior neural models for vulnerability repair.

Via

Access Paper or Ask Questions

Using Sequence-to-Sequence Learning for Repairing C Vulnerabilities

Dec 04, 2019

Zimin Chen, Steve Kommrusch, Martin Monperrus

Figure 1 for Using Sequence-to-Sequence Learning for Repairing C Vulnerabilities

Abstract:Software vulnerabilities affect all businesses and research is being done to avoid, detect or repair them. In this article, we contribute a new technique for automatic vulnerability fixing. We present a system that uses the rich software development history that can be found on GitHub to train an AI system that generates patches. We apply sequence-to-sequence learning on a big dataset of code changes and we evaluate the trained system on real world vulnerabilities from the CVE database. The result shows the feasibility of using sequence-to-sequence learning for fixing software vulnerabilities.

Via

Access Paper or Ask Questions

Learning the Relation between Code Features and Code Transforms with Structured Prediction

Jul 22, 2019

Zhongxing Yu, Matias Martinez, Tegawendé F. Bissyandé, Martin Monperrus

Figure 1 for Learning the Relation between Code Features and Code Transforms with Structured Prediction

Figure 2 for Learning the Relation between Code Features and Code Transforms with Structured Prediction

Figure 3 for Learning the Relation between Code Features and Code Transforms with Structured Prediction

Figure 4 for Learning the Relation between Code Features and Code Transforms with Structured Prediction

Abstract:We present in this paper the first approach for structurally predicting code transforms at the level of AST nodes using conditional random fields. Our approach first learns offline a probabilistic model that captures how certain code transforms are applied to certain AST nodes, and then uses the learned model to predict transforms for new, unseen code snippets. We implement our approach in the context of repair transform prediction for Java programs. Our implementation contains a set of carefully designed code features, deals with the training data imbalance issue, and comprises transform constraints that are specific to code. We conduct a large-scale experimental evaluation based on a dataset of 4,590,679 bug fixing commits from real-world Java projects. The experimental results show that our approach predicts the code transforms with a success rate varying from 37.1% to 61.1% depending on the transforms.

Via

Access Paper or Ask Questions