Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:LongFaith: Enhancing Long-Context Reasoning in LLMs with Faithful Synthetic Data

Feb 18, 2025

Cehao Yang, Xueyuan Lin, Chengjin Xu, Xuhui Jiang, Shengjie Ma, Aofan Liu, Hui Xiong, Jian Guo

Figure 1 for LongFaith: Enhancing Long-Context Reasoning in LLMs with Faithful Synthetic Data

Figure 2 for LongFaith: Enhancing Long-Context Reasoning in LLMs with Faithful Synthetic Data

Figure 3 for LongFaith: Enhancing Long-Context Reasoning in LLMs with Faithful Synthetic Data

Figure 4 for LongFaith: Enhancing Long-Context Reasoning in LLMs with Faithful Synthetic Data

Share this with someone who'll enjoy it:

Abstract:Despite the growing development of long-context large language models (LLMs), data-centric approaches relying on synthetic data have been hindered by issues related to faithfulness, which limit their effectiveness in enhancing model performance on tasks such as long-context reasoning and question answering (QA). These challenges are often exacerbated by misinformation caused by lack of verification, reasoning without attribution, and potential knowledge conflicts. We propose LongFaith, a novel pipeline for synthesizing faithful long-context reasoning instruction datasets. By integrating ground truth and citation-based reasoning prompts, we eliminate distractions and improve the accuracy of reasoning chains, thus mitigating the need for costly verification processes. We open-source two synthesized datasets, LongFaith-SFT and LongFaith-PO, which systematically address multiple dimensions of faithfulness, including verified reasoning, attribution, and contextual grounding. Extensive experiments on multi-hop reasoning datasets and LongBench demonstrate that models fine-tuned on these datasets significantly improve performance. Our ablation studies highlight the scalability and adaptability of the LongFaith pipeline, showcasing its broad applicability in developing long-context LLMs.

View paper on

Share this with someone who'll enjoy it:

Title:LongFaith: Enhancing Long-Context Reasoning in LLMs with Faithful Synthetic Data

Paper and Code