Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation

May 19, 2024

Jianbo Dai, Jianqiao Lu, Yunlong Feng, Rongju Ruan, Ming Cheng, Haochen Tan, Zhijiang Guo

Figure 1 for MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation

Figure 2 for MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation

Figure 3 for MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation

Figure 4 for MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation

Share this with someone who'll enjoy it:

Abstract:Recent advancements in large language models (LLMs) have greatly improved code generation, specifically at the function level. For instance, GPT-4 has achieved an 88.4% pass rate on HumanEval. However, this draws into question the adequacy of existing benchmarks in thoroughly assessing function-level code generation capabilities. Our study analyzed two common benchmarks, HumanEval and MBPP, and found that these might not thoroughly evaluate LLMs' code generation capacities due to limitations in quality, difficulty, and granularity. To resolve this, we introduce the Mostly Hard Python Problems (MHPP) dataset, consisting of 140 unique human-curated problems. By focusing on the combination of natural language and code reasoning, MHPP gauges LLMs' abilities to comprehend specifications and restrictions, engage in multi-step reasoning, and apply coding knowledge effectively. Initial evaluations of 22 LLMs using MHPP showed many high-performing models on HumanEval failed to achieve similar success on MHPP. Moreover, MHPP highlighted various previously undiscovered limitations within various LLMs, leading us to believe that it could pave the way for a better understanding of LLMs' capabilities and limitations. Dataset and code are available at https://github.com/SparksofAGI/MHPP.

* 39 pages, dataset and code are available at https://github.com/SparksofAGI/MHPP

View paper on

Share this with someone who'll enjoy it:

Title:MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation

Paper and Code