Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Hypothetical Documents or Knowledge Leakage? Rethinking LLM-based Query Expansion

Apr 19, 2025

Yejun Yoon, Jaeyoon Jung, Seunghyun Yoon, Kunwoo Park

Share this with someone who'll enjoy it:

Abstract:Query expansion methods powered by large language models (LLMs) have demonstrated effectiveness in zero-shot retrieval tasks. These methods assume that LLMs can generate hypothetical documents that, when incorporated into a query vector, enhance the retrieval of real evidence. However, we challenge this assumption by investigating whether knowledge leakage in benchmarks contributes to the observed performance gains. Using fact verification as a testbed, we analyzed whether the generated documents contained information entailed by ground truth evidence and assessed their impact on performance. Our findings indicate that performance improvements occurred consistently only for claims whose generated documents included sentences entailed by ground truth evidence. This suggests that knowledge leakage may be present in these benchmarks, inflating the perceived performance of LLM-based query expansion methods, particularly in real-world scenarios that require retrieving niche or novel knowledge.

* preprint

View paper on

Share this with someone who'll enjoy it:

Title:Hypothetical Documents or Knowledge Leakage? Rethinking LLM-based Query Expansion

Paper and Code