Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation

Oct 08, 2023

Weixiang Yan, Yuchen Tian, Yunzhe Li, Qian Chen, Wen Wang

Figure 1 for CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation

Figure 2 for CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation

Figure 3 for CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation

Figure 4 for CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation

Share this with someone who'll enjoy it:

Abstract:Recent code translation techniques exploit neural machine translation models to translate source code from one programming language to another to satisfy production compatibility or to improve efficiency of codebase maintenance. Most existing code translation datasets only focus on a single pair of popular programming languages. To advance research on code translation and meet diverse requirements of real-world applications, we construct CodeTransOcean, a large-scale comprehensive benchmark that supports the largest variety of languages for code translation. CodeTransOcean consists of three novel multilingual datasets, namely, MultilingualTrans supporting translations between multiple popular programming languages, NicheTrans for translating between niche programming languages and popular ones, and LLMTrans for evaluating compilability of translated code by large language models (LLMs). CodeTransOcean also includes a novel cross-framework dataset, DLTrans, for translating deep learning code across different frameworks. We develop multilingual modeling approaches for code translation and demonstrate their great potential in improving the translation quality of both low-resource and high-resource language pairs and boosting the training efficiency. We also propose a novel evaluation metric Debugging Success Rate@K for program-level code translation. Last but not least, we evaluate LLM ChatGPT on our datasets and investigate its potential for fuzzy compilation predictions. We build baselines for CodeTransOcean and analyze challenges of code translation for guiding future research.

* Accepted by Findings of EMNLP 2023

View paper on

Share this with someone who'll enjoy it:

Title:CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation

Paper and Code