Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kexin Luan

Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search

Jan 02, 2025

Shuangtao Li, Shuaihao Dong, Kexin Luan, Xinhan Di, Chaofan Ding

Figure 1 for Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search

Figure 2 for Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search

Figure 3 for Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search

Abstract:Large language models (LLMs) have demonstrated their remarkable capacity across a variety of tasks. However, reasoning remains a challenge for LLMs. To improve LLMs' reasoning ability, process supervision has proven to be better than outcome supervision. In this work, we study using Monte Carlo Tree Search (MCTS) to generate process supervision data with LLMs themselves for training them. We sample reasoning steps with an LLM and assign each step a score that captures its "relative correctness," and the LLM is then trained by minimizing weighted log-likelihood of generating the reasoning steps. This generate-then-train process is repeated iteratively until convergence.Our experimental results demonstrate that the proposed methods considerably improve the performance of LLMs on two mathematical reasoning datasets. Furthermore, models trained on one dataset also exhibit improved performance on the other, showing the transferability of the enhanced reasoning ability.

* 5 pages, 1 figure, 2 tables accepted by aaai 2025 NeurMAD workshop

Via

Access Paper or Ask Questions

Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning

Dec 23, 2024

Huchen Jiang, Yangyang Ma, Chaofan Ding, Kexin Luan, Xinhan Di

Figure 1 for Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning

Figure 2 for Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning

Figure 3 for Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning

Figure 4 for Towards Intrinsic Self-Correction Enhancement in Monte Carlo Tree Search Boosted Reasoning via Iterative Preference Learning

Abstract:With current state-of-the-art approaches aimed at enhancing the reasoning capabilities of Large Language Models(LLMs) through iterative preference learning inspired by AlphaZero, we propose to further enhance the step-wise reasoning capabilities through intrinsic self-correction to some extent. Our work leverages step-wise preference learning to enhance self-verification via reinforcement learning. We initially conduct our work through a two-stage training procedure. At the first stage, the self-correction reasoning ability of an LLM is enhanced through its own predictions, relying entirely on self-generated data within the intrinsic self-correction to some extent. At the second stage, the baseline step-wise preference learning is leveraged via the application of the enhanced self-correct policy achieved at the first stage. In the evaluation of arithmetic reasoning tasks, our approach outperforms OpenMath2-Llama3.1-8B, dart-math-mistral-7b-uniform on MATH with increases in accuracy to 71.34%(+4.18%) and 48.06%(+4.94%) and LLama-3.1-8B-Instruct, Mistral-7B-Instruct-v0.1 on GSM8K with increases in accuracy to 86.76%(+2.00%) and 38.06%(+2.28%).

* 6 Pages,3 figures, accepted by AAAI 2025 Workshop NeurMAD

Via

Access Paper or Ask Questions

Low-Rank Adaptation with Task-Relevant Feature Enhancement for Fine-tuning Language Models

Dec 13, 2024

Changqun Li, Chaofan Ding, Kexin Luan, Xinhan Di

Figure 1 for Low-Rank Adaptation with Task-Relevant Feature Enhancement for Fine-tuning Language Models

Figure 2 for Low-Rank Adaptation with Task-Relevant Feature Enhancement for Fine-tuning Language Models

Figure 3 for Low-Rank Adaptation with Task-Relevant Feature Enhancement for Fine-tuning Language Models

Figure 4 for Low-Rank Adaptation with Task-Relevant Feature Enhancement for Fine-tuning Language Models

Abstract:Fine-tuning pre-trained large language models in a parameter-efficient manner is widely studied for its effectiveness and efficiency. LoRA is one of the most widely used methods, which assumes that the optimization process is essentially low dimensional. Although LoRA has demonstrated commendable performance, there remains a significant performance gap between LoRA and full fine-tuning when learning new tasks. In this work, we propose Low-Rank Adaptation with Task-Relevant Feature Enhancement(LoRATRF) for enhancing task-relevant features from the perspective of editing neural network representations. To prioritize task-relevant features, a task-aware filter that selectively extracts valuable knowledge from hidden representations for the target or current task is designed. As the experiments on a vareity of datasets including NLU, commonsense reasoning and mathematical reasoning tasks demonstrates, our method reduces 33.71% parameters and achieves better performance on a variety of datasets in comparison with SOTA low-rank methods.

* 6 Pages, 3 figures accepted by AAAI 2025 CoLoRAI - Connecting Low-Rank Representations in AI Workshop

Via

Access Paper or Ask Questions