Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Data Augmentation for Code Translation with Comparable Corpora and Multiple References

Nov 01, 2023

Yiqing Xie, Atharva Naik, Daniel Fried, Carolyn Rose

Figure 1 for Data Augmentation for Code Translation with Comparable Corpora and Multiple References

Figure 2 for Data Augmentation for Code Translation with Comparable Corpora and Multiple References

Figure 3 for Data Augmentation for Code Translation with Comparable Corpora and Multiple References

Figure 4 for Data Augmentation for Code Translation with Comparable Corpora and Multiple References

Share this with someone who'll enjoy it:

Abstract:One major challenge of translating code between programming languages is that parallel training data is often limited. To overcome this challenge, we present two data augmentation techniques, one that builds comparable corpora (i.e., code pairs with similar functionality), and another that augments existing parallel data with multiple reference translations. Specifically, we build and analyze multiple types of comparable corpora, including programs generated from natural language documentation using a code generation model. Furthermore, to reduce overfitting to a single reference translation, we automatically generate additional translation references for available parallel data and filter the translations by unit tests, which increases variation in target translations. Experiments show that our data augmentation techniques significantly improve CodeT5 for translation between Java, Python, and C++ by an average of 7.5% Computational Accuracy (CA@1), which verifies the correctness of translations by execution. The code is available at https://github.com/Veronicium/CMTrans.

* EMNLP 2023 Findings

View paper on

Share this with someone who'll enjoy it:

Title:Data Augmentation for Code Translation with Comparable Corpora and Multiple References

Paper and Code