Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:An Empirical Study on Eliciting and Improving R1-like Reasoning Models

Mar 06, 2025

Zhipeng Chen, Yingqian Min, Beichen Zhang, Jie Chen, Jinhao Jiang, Daixuan Cheng, Wayne Xin Zhao, Zheng Liu, Xu Miao, Yang Lu(+3 more)

Figure 1 for An Empirical Study on Eliciting and Improving R1-like Reasoning Models

Figure 2 for An Empirical Study on Eliciting and Improving R1-like Reasoning Models

Figure 3 for An Empirical Study on Eliciting and Improving R1-like Reasoning Models

Figure 4 for An Empirical Study on Eliciting and Improving R1-like Reasoning Models

Share this with someone who'll enjoy it:

Abstract:In this report, we present the third technical report on the development of slow-thinking models as part of the STILL project. As the technical pathway becomes clearer, scaling RL training has become a central technique for implementing such reasoning models. We systematically experiment with and document the effects of various factors influencing RL training, conducting experiments on both base models and fine-tuned models. Specifically, we demonstrate that our RL training approach consistently improves the Qwen2.5-32B base models, enhancing both response length and test accuracy. Furthermore, we show that even when a model like DeepSeek-R1-Distill-Qwen-1.5B has already achieved a high performance level, it can be further refined through RL training, reaching an accuracy of 39.33% on AIME 2024. Beyond RL training, we also explore the use of tool manipulation, finding that it significantly boosts the reasoning performance of large reasoning models. This approach achieves a remarkable accuracy of 86.67% with greedy search on AIME 2024, underscoring its effectiveness in enhancing model capabilities. We release our resources at the STILL project website: https://github.com/RUCAIBox/Slow_Thinking_with_LLMs.

* Technical Report on Slow Thinking with LLMs: Part III

View paper on

Share this with someone who'll enjoy it:

Title:An Empirical Study on Eliciting and Improving R1-like Reasoning Models

Paper and Code