Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Sample-Efficient Alignment for LLMs

Nov 03, 2024

Zichen Liu, Changyu Chen, Chao Du, Wee Sun Lee, Min Lin

Figure 1 for Sample-Efficient Alignment for LLMs

Figure 2 for Sample-Efficient Alignment for LLMs

Figure 3 for Sample-Efficient Alignment for LLMs

Figure 4 for Sample-Efficient Alignment for LLMs

Share this with someone who'll enjoy it:

Abstract:We study methods for efficiently aligning large language models (LLMs) with human preferences given budgeted online feedback. We first formulate the LLM alignment problem in the frame of contextual dueling bandits. This formulation, subsuming recent paradigms such as online RLHF and online DPO, inherently quests for sample-efficient algorithms that incorporate online active exploration. Leveraging insights from bandit theory, we introduce a unified algorithm based on Thompson sampling and highlight its applications in two distinct LLM alignment scenarios. The practical agent that efficiently implements this algorithm, named SEA (Sample-Efficient Alignment), is empirically validated through extensive experiments across three model scales (1B, 2.8B, 6.9B) and three preference learning algorithms (DPO, IPO, SLiC). The results demonstrate that SEA achieves highly sample-efficient alignment with oracle's preferences, outperforming recent active exploration methods for LLMs. Additionally, we release the implementation of SEA together with an efficient codebase designed for online alignment of LLMs, aiming to accelerate future research in this field.

View paper on

Share this with someone who'll enjoy it:

Title:Sample-Efficient Alignment for LLMs

Paper and Code