Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models

Sep 14, 2024

Yuanzhao Zhai, Tingkai Yang, Kele Xu, Feng Dawei, Cheng Yang, Bo Ding, Huaimin Wang

Figure 1 for Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models

Figure 2 for Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models

Figure 3 for Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models

Figure 4 for Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models

Share this with someone who'll enjoy it:

Abstract:Agents significantly enhance the capabilities of standalone Large Language Models (LLMs) by perceiving environments, making decisions, and executing actions. However, LLM agents still face challenges in tasks that require multiple decision-making steps. Estimating the value of actions in specific tasks is difficult when intermediate actions are neither appropriately rewarded nor penalized. In this paper, we propose leveraging a task-relevant Q-value model to guide action selection. Specifically, we first collect decision-making trajectories annotated with step-level Q values via Monte Carlo Tree Search (MCTS) and construct preference data. We then use another LLM to fit these preferences through step-level Direct Policy Optimization (DPO), which serves as the Q-value model. During inference, at each decision-making step, LLM agents select the action with the highest Q value before interacting with the environment. We apply our method to various open-source and API-based LLM agents, demonstrating that Q-value models significantly improve their performance. Notably, the performance of the agent built with Phi-3-mini-4k-instruct improved by 103% on WebShop and 75% on HotPotQA when enhanced with Q-value models, even surpassing GPT-4o-mini. Additionally, Q-value models offer several advantages, such as generalization to different LLM agents and seamless integration with existing prompting strategies.

View paper on

Share this with someone who'll enjoy it:

Title:Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models

Paper and Code