Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Johannes Villmow

Addressing Leakage in Self-Supervised Contextualized Code Retrieval

Apr 17, 2022

Johannes Villmow, Viola Campos, Adrian Ulges, Ulrich Schwanecke

Figure 1 for Addressing Leakage in Self-Supervised Contextualized Code Retrieval

Figure 2 for Addressing Leakage in Self-Supervised Contextualized Code Retrieval

Figure 3 for Addressing Leakage in Self-Supervised Contextualized Code Retrieval

Figure 4 for Addressing Leakage in Self-Supervised Contextualized Code Retrieval

Abstract:We address contextualized code retrieval, the search for code snippets helpful to fill gaps in a partial input program. Our approach facilitates a large-scale self-supervised contrastive training by splitting source code randomly into contexts and targets. To combat leakage between the two, we suggest a novel approach based on mutual identifier masking, dedentation, and the selection of syntax-aligned targets. Our second contribution is a new dataset for direct evaluation of contextualized code retrieval, based on a dataset of manually aligned subpassages of code clones. Our experiments demonstrate that our approach improves retrieval substantially, and yields new state-of-the-art results for code clone and defect detection.

* 4 pages, 5 figures

Via

Access Paper or Ask Questions

An Open-World Extension to Knowledge Graph Completion Models

Jun 19, 2019

Haseeb Shah, Johannes Villmow, Adrian Ulges, Ulrich Schwanecke, Faisal Shafait

Figure 1 for An Open-World Extension to Knowledge Graph Completion Models

Figure 2 for An Open-World Extension to Knowledge Graph Completion Models

Figure 3 for An Open-World Extension to Knowledge Graph Completion Models

Figure 4 for An Open-World Extension to Knowledge Graph Completion Models

Abstract:We present a novel extension to embedding-based knowledge graph completion models which enables them to perform open-world link prediction, i.e. to predict facts for entities unseen in training based on their textual description. Our model combines a regular link prediction model learned from a knowledge graph with word embeddings learned from a textual corpus. After training both independently, we learn a transformation to map the embeddings of an entity's name and description to the graph-based embedding space. In experiments on several datasets including FB20k, DBPedia50k and our new dataset FB15k-237-OWE, we demonstrate competitive results. Particularly, our approach exploits the full knowledge graph structure even when textual descriptions are scarce, does not require a joint training on graph and text, and can be applied to any embedding-based link prediction model, such as TransE, ComplEx and DistMult.

* 8 pages, accepted to AAAI-2019

Via

Access Paper or Ask Questions