Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation

Nov 18, 2022

Yuhang Lai, Chengxi Li, Yiming Wang, Tianyi Zhang, Ruiqi Zhong, Luke Zettlemoyer, Scott Wen-tau Yih, Daniel Fried, Sida Wang, Tao Yu

Figure 1 for DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation

Figure 2 for DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation

Figure 3 for DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation

Figure 4 for DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation

Share this with someone who'll enjoy it:

Abstract:We introduce DS-1000, a code generation benchmark with a thousand data science problems spanning seven Python libraries, such as NumPy and Pandas. Compared to prior works, DS-1000 incorporates three core features. First, our problems reflect diverse, realistic, and practical use cases since we collected them from StackOverflow. Second, our automatic evaluation is highly specific (reliable) -- across all Codex-002-predicted solutions that our evaluation accept, only 1.8% of them are incorrect; we achieve this with multi-criteria metrics, checking both functional correctness by running test cases and surface-form constraints by restricting API usages or keywords. Finally, we proactively defend against memorization by slightly modifying our problems to be different from the original StackOverflow source; consequently, models cannot answer them correctly by memorizing the solutions from pre-training. The current best public system (Codex-002) achieves 43.3% accuracy, leaving ample room for improvement. We release our benchmark at https://ds1000-code-gen.github.io.

View paper on

Share this with someone who'll enjoy it:

Title:DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation

Paper and Code