Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alexandre Evfimievski

RAG-ConfusionQA: A Benchmark for Evaluating LLMs on Confusing Questions

Oct 18, 2024

Zhiyuan Peng, Jinming Nian, Alexandre Evfimievski, Yi Fang

Figure 1 for RAG-ConfusionQA: A Benchmark for Evaluating LLMs on Confusing Questions

Figure 2 for RAG-ConfusionQA: A Benchmark for Evaluating LLMs on Confusing Questions

Figure 3 for RAG-ConfusionQA: A Benchmark for Evaluating LLMs on Confusing Questions

Figure 4 for RAG-ConfusionQA: A Benchmark for Evaluating LLMs on Confusing Questions

Abstract:Conversational AI agents use Retrieval Augmented Generation (RAG) to provide verifiable document-grounded responses to user inquiries. However, many natural questions do not have good answers: about 25\% contain false assumptions~\cite{Yu2023:CREPE}, and over 50\% are ambiguous~\cite{Min2020:AmbigQA}. RAG agents need high-quality data to improve their responses to confusing questions. This paper presents a novel synthetic data generation method to efficiently create a diverse set of context-grounded confusing questions from a given document corpus. We conduct an empirical comparative evaluation of several large language models as RAG agents to measure the accuracy of confusion detection and appropriate response generation. We contribute a benchmark dataset to the public domain.

* under review

Via

Access Paper or Ask Questions