Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Steve Kommrusch

Self-Supervised Learning to Prove Equivalence Between Programs via Semantics-Preserving Rewrite Rules

Sep 22, 2021

Steve Kommrusch, Martin Monperrus, Louis-Noël Pouchet

Figure 1 for Self-Supervised Learning to Prove Equivalence Between Programs via Semantics-Preserving Rewrite Rules

Figure 2 for Self-Supervised Learning to Prove Equivalence Between Programs via Semantics-Preserving Rewrite Rules

Figure 3 for Self-Supervised Learning to Prove Equivalence Between Programs via Semantics-Preserving Rewrite Rules

Figure 4 for Self-Supervised Learning to Prove Equivalence Between Programs via Semantics-Preserving Rewrite Rules

Abstract:We target the problem of synthesizing proofs of semantic equivalence between two programs made of sequences of statements with complex symbolic expressions. We propose a neural network architecture based on the transformer to generate axiomatic proofs of equivalence between program pairs. We generate expressions which include scalars and vectors and support multi-typed rewrite rules to prove equivalence. For training the system, we develop an original training technique, which we call self-supervised sample selection. This incremental training improves the quality, generalizability and extensibility of the learned model. We study the effectiveness of the system to generate proofs of increasing length, and we demonstrate how transformer models learn to represent complex and verifiable symbolic reasoning. Our system, S4Eq, achieves 97% proof success on 10,000 pairs of programs while ensuring zero false positives by design.

* 18 pages

Via

Access Paper or Ask Questions

Proving Equivalence Between Complex Expressions Using Graph-to-Sequence Neural Models

Jun 09, 2021

Steve Kommrusch, Théo Barollet, Louis-Noël Pouchet

Figure 1 for Proving Equivalence Between Complex Expressions Using Graph-to-Sequence Neural Models

Figure 2 for Proving Equivalence Between Complex Expressions Using Graph-to-Sequence Neural Models

Figure 3 for Proving Equivalence Between Complex Expressions Using Graph-to-Sequence Neural Models

Figure 4 for Proving Equivalence Between Complex Expressions Using Graph-to-Sequence Neural Models

Abstract:We target the problem of provably computing the equivalence between two complex expression trees. To this end, we formalize the problem of equivalence between two such programs as finding a set of semantics-preserving rewrite rules from one into the other, such that after the rewrite the two programs are structurally identical, and therefore trivially equivalent.We then develop a graph-to-sequence neural network system for program equivalence, trained to produce such rewrite sequences from a carefully crafted automatic example generation algorithm. We extensively evaluate our system on a rich multi-type linear algebra expression language, using arbitrary combinations of 100+ graph-rewriting axioms of equivalence. Our machine learning system guarantees correctness for all true negatives, and ensures 0 false positive by design. It outputs via inference a valid proof of equivalence for 93% of the 10,000 equivalent expression pairs isolated for testing, using up to 50-term expressions. In all cases, the validity of the sequence produced and therefore the provable assertion of program equivalence is always computable, in negligible time.

* 10 pages (24 including references and appendices), 8 figures, 17 tables. arXiv admin note: substantial text overlap with arXiv:2002.06799. Updated to include funding acknowledgement

Via

Access Paper or Ask Questions

Neural Transfer Learning for Repairing Security Vulnerabilities in C Code

Apr 16, 2021

Zimin Chen, Steve Kommrusch, Martin Monperrus

Figure 1 for Neural Transfer Learning for Repairing Security Vulnerabilities in C Code

Figure 2 for Neural Transfer Learning for Repairing Security Vulnerabilities in C Code

Figure 3 for Neural Transfer Learning for Repairing Security Vulnerabilities in C Code

Figure 4 for Neural Transfer Learning for Repairing Security Vulnerabilities in C Code

Abstract:In this paper, we address the problem of automatic repair of software vulnerabilities with deep learning. The major problem with data-driven vulnerability repair is that the few existing datasets of known confirmed vulnerabilities consist of only a few thousand examples. However, training a deep learning model often requires hundreds of thousands of examples. In this work, we leverage the intuition that the bug fixing task and the vulnerability fixing task are related, and the knowledge learned from bug fixes can be transferred to fixing vulnerabilities. In the machine learning community, this technique is called transfer learning. In this paper, we propose an approach for repairing security vulnerabilities named VRepair which is based on transfer learning. VRepair is first trained on a large bug fix corpus, and is then tuned on a vulnerability fix dataset, which is an order of magnitudes smaller. In our experiments, we show that a model trained only on a bug fix corpus can already fix some vulnerabilities. Then, we demonstrate that transfer learning improves the ability to repair vulnerable C functions. In the end, we present evidence that transfer learning produces more stable and superior neural models for vulnerability repair.

Via

Access Paper or Ask Questions

Equivalence of Dataflow Graphs via Rewrite Rules Using a Graph-to-Sequence Neural Model

Feb 17, 2020

Steve Kommrusch, Théo Barollet, Louis-Noël Pouchet

Figure 1 for Equivalence of Dataflow Graphs via Rewrite Rules Using a Graph-to-Sequence Neural Model

Figure 2 for Equivalence of Dataflow Graphs via Rewrite Rules Using a Graph-to-Sequence Neural Model

Figure 3 for Equivalence of Dataflow Graphs via Rewrite Rules Using a Graph-to-Sequence Neural Model

Figure 4 for Equivalence of Dataflow Graphs via Rewrite Rules Using a Graph-to-Sequence Neural Model

Abstract:In this work we target the problem of provably computing the equivalence between two programs represented as dataflow graphs. To this end, we formalize the problem of equivalence between two programs as finding a set of semantics-preserving rewrite rules from one into the other, such that after the rewrite the two programs are structurally identical, and therefore trivially equivalent. We then develop the first graph-to-sequence neural network system for program equivalence, trained to produce such rewrite sequences from a carefully crafted automatic example generation algorithm. We extensively evaluate our system on a rich multi-type linear algebra expression language, using arbitrary combinations of 100+ graph-rewriting axioms of equivalence. Our system outputs via inference a correct rewrite sequence for 96% of the 10,000 program pairs isolated for testing, using 30-term programs. And in all cases, the validity of the sequence produced and therefore the provable assertion of program equivalence is computable, in negligible time.

* 20 pages including references and appendices, 10 figures

Via

Access Paper or Ask Questions

Using Sequence-to-Sequence Learning for Repairing C Vulnerabilities

Dec 04, 2019

Zimin Chen, Steve Kommrusch, Martin Monperrus

Figure 1 for Using Sequence-to-Sequence Learning for Repairing C Vulnerabilities

Abstract:Software vulnerabilities affect all businesses and research is being done to avoid, detect or repair them. In this article, we contribute a new technique for automatic vulnerability fixing. We present a system that uses the rich software development history that can be found on GitHub to train an AI system that generates patches. We apply sequence-to-sequence learning on a big dataset of code changes and we evaluate the trained system on real world vulnerabilities from the CVE database. The result shows the feasibility of using sequence-to-sequence learning for fixing software vulnerabilities.

Via

Access Paper or Ask Questions

SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair

Dec 24, 2018

Zimin Chen, Steve Kommrusch, Michele Tufano, Louis-Noël Pouchet, Denys Poshyvanyk, Martin Monperrus

Figure 1 for SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair

Figure 2 for SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair

Figure 3 for SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair

Figure 4 for SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair

Abstract:This paper presents a novel end-to-end approach to program repair based on sequence-to-sequence learning. We devise, implement, and evaluate a system, called SequenceR, for fixing bugs based on sequence-to-sequence learning on source code. This approach uses the copy mechanism to overcome the unlimited vocabulary problem that occurs with big code. Our system is data-driven; we train it on 35,578 commits, carefully curated from open-source repositories. We evaluate it on 4,711 independent real bug fixes, as well on the Defects4J benchmark used in program repair research. SequenceR is able to perfectly predict the fixed line for 950/4711 testing samples. It captures a wide range of repair operators without any domain-specific top-down design.

* 21 pages, 15 figures

Via

Access Paper or Ask Questions

Synthetic Lung Nodule 3D Image Generation Using Autoencoders

Nov 19, 2018

Steve Kommrusch, Louis-Noel Pouchet

Figure 1 for Synthetic Lung Nodule 3D Image Generation Using Autoencoders

Figure 2 for Synthetic Lung Nodule 3D Image Generation Using Autoencoders

Figure 3 for Synthetic Lung Nodule 3D Image Generation Using Autoencoders

Figure 4 for Synthetic Lung Nodule 3D Image Generation Using Autoencoders

Abstract:One of the challenges of using machine learning techniques with medical data is the frequent dearth of source image data on which to train. A representative example is automated lung cancer diagnosis, where nodule images need to be classified as suspicious or benign. In this work we propose an automatic synthetic lung nodule image generator. Our 3D shape generator is designed to augment the variety of 3D images. Our proposed system takes root in autoencoder techniques, and we provide extensive experimental characterization that demonstrates its ability to produce quality synthetic images.

* 19 pages, 12 figures, full paper for work initially presented at IJCAI 2018

Via

Access Paper or Ask Questions