Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DAPR: A Benchmark on Document-Aware Passage Retrieval

May 23, 2023

Kexin Wang, Nils Reimers, Iryna Gurevych

Figure 1 for DAPR: A Benchmark on Document-Aware Passage Retrieval

Figure 2 for DAPR: A Benchmark on Document-Aware Passage Retrieval

Figure 3 for DAPR: A Benchmark on Document-Aware Passage Retrieval

Figure 4 for DAPR: A Benchmark on Document-Aware Passage Retrieval

Share this with someone who'll enjoy it:

Abstract:Recent neural retrieval mainly focuses on ranking short texts and is challenged with long documents. Existing work mainly evaluates either ranking passages or whole documents. However, there are many cases where the users want to find a relevant passage within a long document from a huge corpus, e.g. legal cases, research papers, etc. In this scenario, the passage often provides little document context and thus challenges the current approaches to finding the correct document and returning accurate results. To fill this gap, we propose and name this task Document-Aware Passage Retrieval (DAPR) and build a benchmark including multiple datasets from various domains, covering both DAPR and whole-document retrieval. In experiments, we extend the state-of-the-art neural passage retrievers with document-level context via different approaches including prepending document summary, pooling over passage representations, and hybrid retrieval with BM25. The hybrid-retrieval systems, the overall best, can only improve on the DAPR tasks marginally while significantly improving on the document-retrieval tasks. This motivates further research in developing better retrieval systems for the new task. The code and the data are available at https://github.com/kwang2049/dapr

* Work in progress

View paper on

Share this with someone who'll enjoy it:

Title:DAPR: A Benchmark on Document-Aware Passage Retrieval

Paper and Code