Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Benchmarking Data Science Agents

Feb 27, 2024

Yuge Zhang, Qiyang Jiang, Xingyu Han, Nan Chen, Yuqing Yang, Kan Ren

Figure 1 for Benchmarking Data Science Agents

Figure 2 for Benchmarking Data Science Agents

Figure 3 for Benchmarking Data Science Agents

Figure 4 for Benchmarking Data Science Agents

Share this with someone who'll enjoy it:

Abstract:In the era of data-driven decision-making, the complexity of data analysis necessitates advanced expertise and tools of data science, presenting significant challenges even for specialists. Large Language Models (LLMs) have emerged as promising aids as data science agents, assisting humans in data analysis and processing. Yet their practical efficacy remains constrained by the varied demands of real-world applications and complicated analytical process. In this paper, we introduce DSEval -- a novel evaluation paradigm, as well as a series of innovative benchmarks tailored for assessing the performance of these agents throughout the entire data science lifecycle. Incorporating a novel bootstrapped annotation method, we streamline dataset preparation, improve the evaluation coverage, and expand benchmarking comprehensiveness. Our findings uncover prevalent obstacles and provide critical insights to inform future advancements in the field.

* Source code and data are available at https://github.com/MetaCopilot/dseval

View paper on

Share this with someone who'll enjoy it:

Title:Benchmarking Data Science Agents

Paper and Code