Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

An Bo

**Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning**

Jun 20, 2024

Chaojie Wang, Yanchen Deng, Zhiyi Lv, Shuicheng Yan, An Bo

Figure 1 for Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning

Figure 2 for Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning

Figure 3 for Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning

Figure 4 for Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning

Abstract:Large Language Models (LLMs) have demonstrated impressive capability in many nature language tasks. However, the auto-regressive generation process makes LLMs prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning. In this paper, we aim to alleviate the pathology by introducing Q*, a general, versatile and agile framework for guiding LLMs decoding process with deliberative planning. By learning a plug-and-play Q-value model as heuristic function, our Q* can effectively guide LLMs to select the most promising next step without fine-tuning LLMs for each task, which avoids the significant computational overhead and potential risk of performance degeneration on other tasks. Extensive experiments on GSM8K, MATH and MBPP confirm the superiority of our method.

Via

Access Paper or Ask Questions