Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Intrinsic Evaluation of RAG Systems for Deep-Logic Questions

Oct 03, 2024

Junyi Hu, You Zhou, Jie Wang

Figure 1 for Intrinsic Evaluation of RAG Systems for Deep-Logic Questions

Figure 2 for Intrinsic Evaluation of RAG Systems for Deep-Logic Questions

Figure 3 for Intrinsic Evaluation of RAG Systems for Deep-Logic Questions

Figure 4 for Intrinsic Evaluation of RAG Systems for Deep-Logic Questions

Share this with someone who'll enjoy it:

Abstract:We introduce the Overall Performance Index (OPI), an intrinsic metric to evaluate retrieval-augmented generation (RAG) mechanisms for applications involving deep-logic queries. OPI is computed as the harmonic mean of two key metrics: the Logical-Relation Correctness Ratio and the average of BERT embedding similarity scores between ground-truth and generated answers. We apply OPI to assess the performance of LangChain, a popular RAG tool, using a logical relations classifier fine-tuned from GPT-4o on the RAG-Dataset-12000 from Hugging Face. Our findings show a strong correlation between BERT embedding similarity scores and extrinsic evaluation scores. Among the commonly used retrievers, the cosine similarity retriever using BERT-based embeddings outperforms others, while the Euclidean distance-based retriever exhibits the weakest performance. Furthermore, we demonstrate that combining multiple retrievers, either algorithmically or by merging retrieved sentences, yields superior performance compared to using any single retriever alone.

View paper on

Share this with someone who'll enjoy it:

Title:Intrinsic Evaluation of RAG Systems for Deep-Logic Questions

Paper and Code