Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models

Aug 01, 2024

Yangzhen Wu, Zhiqing Sun, Shanda Li, Sean Welleck, Yiming Yang

Figure 1 for An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models

Figure 2 for An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models

Figure 3 for An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models

Figure 4 for An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models

Share this with someone who'll enjoy it:

Abstract:The optimal training configurations of large language models (LLMs) with respect to model sizes and compute budgets have been extensively studied. But how to optimally configure LLMs during inference has not been explored in sufficient depth. We study compute-optimal inference: designing models and inference strategies that optimally trade off additional inference-time compute for improved performance. As a first step towards understanding and designing compute-optimal inference methods, we assessed the effectiveness and computational efficiency of multiple inference strategies such as Greedy Search, Majority Voting, Best-of-N, Weighted Voting, and their variants on two different Tree Search algorithms, involving different model sizes and computational budgets. We found that a smaller language model with a novel tree search algorithm typically achieves a Pareto-optimal trade-off. These results highlight the potential benefits of deploying smaller models equipped with more sophisticated decoding algorithms in budget-constrained scenarios, e.g., on end-devices, to enhance problem-solving accuracy. For instance, we show that the Llemma-7B model can achieve competitive accuracy to a Llemma-34B model on MATH500 while using $2\times$ less FLOPs. Our findings could potentially apply to any generation task with a well-defined measure of success.

View paper on

Share this with someone who'll enjoy it:

Title:An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models

Paper and Code