Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

Feb 05, 2024

Shihan Dou, Yan Liu, Haoxiang Jia, Limao Xiong, Enyu Zhou, Wei Shen, Junjie Shan, Caishuang Huang, Xiao Wang, Xiaoran Fan(+7 more)

Figure 1 for StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

Figure 2 for StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

Figure 3 for StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

Figure 4 for StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

Share this with someone who'll enjoy it:

Abstract:The advancement of large language models (LLMs) has significantly propelled the field of code generation. Previous work integrated reinforcement learning (RL) with compiler feedback for exploring the output space of LLMs to enhance code generation quality. However, the lengthy code generated by LLMs in response to complex human requirements makes RL exploration a challenge. Also, since the unit tests may not cover the complicated code, optimizing LLMs by using these unexecuted code snippets is ineffective. To tackle these challenges, we introduce StepCoder, a novel RL framework for code generation, consisting of two main components: CCCS addresses the exploration challenge by breaking the long sequences code generation task into a Curriculum of Code Completion Subtasks, while FGO only optimizes the model by masking the unexecuted code segments to provide Fine-Grained Optimization. In addition, we furthermore construct the APPS+ dataset for RL training, which is manually verified to ensure the correctness of unit tests. Experimental results show that our method improves the ability to explore the output space and outperforms state-of-the-art approaches in corresponding benchmarks. Our dataset APPS+ and StepCoder are available online.

* 13 pages, 5 figures

View paper on

Share this with someone who'll enjoy it:

Title:StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

Paper and Code