Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion

Jan 25, 2025

Zhan Ling, Kang Liu, Kai Yan, Yifan Yang, Weijian Lin, Ting-Han Fan, Lingfeng Shen, Zhengyin Du, Jiecao Chen

Figure 1 for LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion

Figure 2 for LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion

Figure 3 for LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion

Figure 4 for LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion

Share this with someone who'll enjoy it:

Abstract:Large language models (LLMs) have demonstrated remarkable progress in understanding long-context inputs. However, benchmarks for evaluating the long-context reasoning abilities of LLMs fall behind the pace. Existing benchmarks often focus on a narrow range of tasks or those that do not demand complex reasoning. To address this gap and enable a more comprehensive evaluation of the long-context reasoning capabilities of current LLMs, we propose a new synthetic benchmark, LongReason, which is constructed by synthesizing long-context reasoning questions from a varied set of short-context reasoning questions through context expansion. LongReason consists of 794 multiple-choice reasoning questions with diverse reasoning patterns across three task categories: reading comprehension, logical inference, and mathematical word problems. We evaluate 21 LLMs on LongReason, revealing that most models experience significant performance drops as context length increases. Our further analysis shows that even state-of-the-art LLMs still have significant room for improvement in providing robust reasoning across different tasks. We will open-source LongReason to support the comprehensive evaluation of LLMs' long-context reasoning capabilities.

View paper on

Share this with someone who'll enjoy it:

Title:LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion

Paper and Code