Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Data Augmentation Approaches for Source Code Models: A Survey

Jun 12, 2023

Terry Yue Zhuo, Zhou Yang, Zhensu Sun, Yufei Wang, Li Li, Xiaoning Du, Zhenchang Xing, David Lo

Figure 1 for Data Augmentation Approaches for Source Code Models: A Survey

Figure 2 for Data Augmentation Approaches for Source Code Models: A Survey

Figure 3 for Data Augmentation Approaches for Source Code Models: A Survey

Figure 4 for Data Augmentation Approaches for Source Code Models: A Survey

Share this with someone who'll enjoy it:

Abstract:The increasingly popular adoption of source code in many critical tasks motivates the development of data augmentation (DA) techniques to enhance training data and improve various capabilities (e.g., robustness and generalizability) of these models. Although a series of DA methods have been proposed and tailored for source code models, there lacks a comprehensive survey and examination to understand their effectiveness and implications. This paper fills this gap by conducting a comprehensive and integrative survey of data augmentation for source code, wherein we systematically compile and encapsulate existing literature to provide a comprehensive overview of the field. We start by constructing a taxonomy of DA for source code models model approaches, followed by a discussion on prominent, methodologically illustrative approaches. Next, we highlight the general strategies and techniques to optimize the DA quality. Subsequently, we underscore techniques that find utility in widely-accepted source code scenarios and downstream tasks. Finally, we outline the prevailing challenges and potential opportunities for future research. In essence, this paper endeavors to demystify the corpus of existing literature on DA for source code models, and foster further exploration in this sphere. Complementing this, we present a continually updated GitHub repository that hosts a list of update-to-date papers on DA for source code models, accessible at \url{https://github.com/terryyz/DataAug4Code}.

* Technical Report

View paper on

Share this with someone who'll enjoy it:

Title:Data Augmentation Approaches for Source Code Models: A Survey

Paper and Code