Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alon Jacoby

Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

Feb 19, 2024

Mosh Levy, Alon Jacoby, Yoav Goldberg

Figure 1 for Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

Figure 2 for Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

Figure 3 for Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

Figure 4 for Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models

Abstract:This paper explores the impact of extending input lengths on the capabilities of Large Language Models (LLMs). Despite LLMs advancements in recent times, their performance consistency across different input lengths is not well understood. We investigate this aspect by introducing a novel QA reasoning framework, specifically designed to assess the impact of input length. We isolate the effect of input length using multiple versions of the same sample, each being extended with padding of different lengths, types and locations. Our findings show a notable degradation in LLMs' reasoning performance at much shorter input lengths than their technical maximum. We show that the degradation trend appears in every version of our dataset, although at different intensities. Additionally, our study reveals that traditional perplexity metrics do not correlate with performance of LLMs' in long input reasoning tasks. We analyse our results and identify failure modes that can serve as useful guides for future research, potentially informing strategies to address the limitations observed in LLMs.

Via

Access Paper or Ask Questions