Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Weak-to-Strong Extrapolation Expedites Alignment

Apr 25, 2024

Chujie Zheng, Ziqi Wang, Heng Ji, Minlie Huang, Nanyun Peng

Figure 1 for Weak-to-Strong Extrapolation Expedites Alignment

Figure 2 for Weak-to-Strong Extrapolation Expedites Alignment

Figure 3 for Weak-to-Strong Extrapolation Expedites Alignment

Figure 4 for Weak-to-Strong Extrapolation Expedites Alignment

Share this with someone who'll enjoy it:

Abstract:Although the capabilities of large language models (LLMs) ideally scale up with increasing data and compute, they are inevitably constrained by limited resources in reality. Suppose we have a moderately trained LLM (e.g., trained to align with human preference) in hand, can we further exploit its potential and cheaply acquire a stronger model? In this paper, we propose a simple method called ExPO to boost LLMs' alignment with human preference. ExPO assumes that a medium-aligned model can be interpolated between a less-aligned (weaker) model, e.g., the initial SFT model, and a better-aligned (stronger) one, thereby directly obtaining this stronger model by extrapolating from the weights of the former two relatively weaker models. On the AlpacaEval 2.0 benchmark, we show that ExPO pushes models trained with less preference data (e.g., 10% or 20%) to reach and even surpass the fully-trained one, without any additional training. Furthermore, ExPO also significantly improves off-the-shelf DPO/RLHF models and exhibits decent scalability across model sizes from 7B to 70B. Our work demonstrates the efficacy of model extrapolation in exploiting LLMs' capabilities, suggesting a promising direction that deserves future exploration.

View paper on

Share this with someone who'll enjoy it:

Title:Weak-to-Strong Extrapolation Expedites Alignment

Paper and Code