Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Mercury: An Efficiency Benchmark for LLM Code Synthesis

Feb 12, 2024

Mingzhe Du, Anh Tuan Luu, Bin Ji, See-Kiong Ng

Figure 1 for Mercury: An Efficiency Benchmark for LLM Code Synthesis

Figure 2 for Mercury: An Efficiency Benchmark for LLM Code Synthesis

Figure 3 for Mercury: An Efficiency Benchmark for LLM Code Synthesis

Figure 4 for Mercury: An Efficiency Benchmark for LLM Code Synthesis

Share this with someone who'll enjoy it:

Abstract:Despite advancements in evaluating Large Language Models (LLMs) for code synthesis, benchmarks have predominantly focused on functional correctness, overlooking the importance of code efficiency. We present Mercury, the first benchmark designated for assessing the code efficiency of LLM code synthesis tasks. Mercury consists of 1,889 programming tasks covering diverse difficulty levels alongside test case generators generating unlimited cases for comprehensive evaluation. Unlike existing benchmarks, Mercury integrates a novel metric Beyond@K to measure normalized code efficiency based on historical submissions, leading to a new evaluation indicator for code synthesis, which encourages generating functionally correct and computationally efficient code, mirroring the real-world software development standard. Our findings reveal that while LLMs demonstrate the remarkable capability to generate functionally correct code, there still exists a substantial gap in their efficiency output, underscoring a new frontier for LLM research and development.

View paper on

Share this with someone who'll enjoy it:

Title:Mercury: An Efficiency Benchmark for LLM Code Synthesis

Paper and Code