Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models

Nov 19, 2024

Salma Kharrat, Fares Fourati, Marco Canini

Figure 1 for ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models

Figure 2 for ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models

Figure 3 for ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models

Figure 4 for ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models

Share this with someone who'll enjoy it:

Abstract:The effectiveness of Large Language Models (LLMs) in solving tasks vastly depends on the quality of the instructions, which often require fine-tuning through extensive human effort. This highlights the need for automated instruction optimization; however, this optimization is particularly challenging when dealing with black-box LLMs, where model parameters and gradients remain inaccessible. We propose ACING, a task-specific prompt optimization approach framed as a stateless continuous-action Reinforcement Learning (RL) problem, known as the continuum bandit setting. ACING leverages an actor-critic-based method to optimize prompts, learning from non-differentiable reward signals. We validate ACING by optimizing prompts for ChatGPT on 30 instruction-based tasks. ACING consistently outperforms baseline methods, achieving a median score improvement of 10 percentage points. Furthermore, ACING not only recovers but also surpasses human-crafted expert instructions, achieving up to a 39 percentage point improvement against human benchmarks.

View paper on

Share this with someone who'll enjoy it:

Title:ACING: Actor-Critic for Instruction Learning in Black-Box Large Language Models

Paper and Code