Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning

Oct 03, 2024

Huimu Yu, Xing Wu, Weidong Yin, Debing Zhang, Songlin Hu

Figure 1 for CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning

Figure 2 for CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning

Figure 3 for CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning

Figure 4 for CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning

Share this with someone who'll enjoy it:

Abstract:Large language models (LLMs) have made significant progress in natural language understanding and generation, driven by scalable pretraining and advanced finetuning. However, enhancing reasoning abilities in LLMs, particularly via reinforcement learning from human feedback (RLHF), remains challenging due to the scarcity of high-quality preference data, which is labor-intensive to annotate and crucial for reward model (RM) finetuning. To alleviate this issue, we introduce CodePMP, a scalable preference model pretraining (PMP) pipeline that utilizes a large corpus of synthesized code-preference pairs from publicly available high-quality source code. CodePMP improves RM finetuning efficiency by pretraining preference models on large-scale synthesized code-preference pairs. We evaluate CodePMP on mathematical reasoning tasks (GSM8K, MATH) and logical reasoning tasks (ReClor, LogiQA2.0), consistently showing significant improvements in reasoning performance of LLMs and highlighting the importance of scalable preference model pretraining for efficient reward modeling.

* work in progress

View paper on

Share this with someone who'll enjoy it:

Title:CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning

Paper and Code