Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yiteng Tu

RbFT: Robust Fine-tuning for Retrieval-Augmented Generation against Retrieval Defects

Jan 30, 2025

Yiteng Tu, Weihang Su, Yujia Zhou, Yiqun Liu, Qingyao Ai

Figure 1 for RbFT: Robust Fine-tuning for Retrieval-Augmented Generation against Retrieval Defects

Figure 2 for RbFT: Robust Fine-tuning for Retrieval-Augmented Generation against Retrieval Defects

Figure 3 for RbFT: Robust Fine-tuning for Retrieval-Augmented Generation against Retrieval Defects

Figure 4 for RbFT: Robust Fine-tuning for Retrieval-Augmented Generation against Retrieval Defects

Abstract:Retrieval-augmented generation (RAG) enhances large language models (LLMs) by integrating external knowledge retrieved from a knowledge base. However, its effectiveness is fundamentally constrained by the reliability of both the retriever and the knowledge base. In real-world scenarios, imperfections in these components often lead to the retrieval of noisy, irrelevant, or misleading counterfactual information, ultimately undermining the trustworthiness of RAG systems. To address this challenge, we propose Robust Fine-Tuning (RbFT), a method designed to enhance the resilience of LLMs against retrieval defects through two targeted fine-tuning tasks. Experimental results demonstrate that RbFT significantly improves the robustness of RAG systems across diverse retrieval conditions, surpassing existing methods while maintaining high inference efficiency and compatibility with other robustness techniques.

Via

Access Paper or Ask Questions

PRE: A Peer Review Based Large Language Model Evaluator

Jan 28, 2024

Zhumin Chu, Qingyao Ai, Yiteng Tu, Haitao Li, Yiqun Liu

Abstract:The impressive performance of large language models (LLMs) has attracted considerable attention from the academic and industrial communities. Besides how to construct and train LLMs, how to effectively evaluate and compare the capacity of LLMs has also been well recognized as an important yet difficult problem. Existing paradigms rely on either human annotators or model-based evaluators to evaluate the performance of LLMs on different tasks. However, these paradigms often suffer from high cost, low generalizability, and inherited biases in practice, which make them incapable of supporting the sustainable development of LLMs in long term. In order to address these issues, inspired by the peer review systems widely used in academic publication process, we propose a novel framework that can automatically evaluate LLMs through a peer-review process. Specifically, for the evaluation of a specific task, we first construct a small qualification exam to select "reviewers" from a couple of powerful LLMs. Then, to actually evaluate the "submissions" written by different candidate LLMs, i.e., the evaluatees, we use the reviewer LLMs to rate or compare the submissions. The final ranking of evaluatee LLMs is generated based on the results provided by all reviewers. We conducted extensive experiments on text summarization tasks with eleven LLMs including GPT-4. The results demonstrate the existence of biasness when evaluating using a single LLM. Also, our PRE model outperforms all the baselines, illustrating the effectiveness of the peer review mechanism.

* 11 pages

Via

Access Paper or Ask Questions