Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DuReader_retrieval: A Large-scale Chinese Benchmark for Passage Retrieval from Web Search Engine

Mar 19, 2022

Yifu Qiu, Hongyu Li, Yingqi Qu, Ying Chen, Qiaoqiao She, Jing Liu, Hua Wu, Haifeng Wang

Figure 1 for DuReader_retrieval: A Large-scale Chinese Benchmark for Passage Retrieval from Web Search Engine

Figure 2 for DuReader_retrieval: A Large-scale Chinese Benchmark for Passage Retrieval from Web Search Engine

Figure 3 for DuReader_retrieval: A Large-scale Chinese Benchmark for Passage Retrieval from Web Search Engine

Figure 4 for DuReader_retrieval: A Large-scale Chinese Benchmark for Passage Retrieval from Web Search Engine

Share this with someone who'll enjoy it:

Abstract:In this paper, we present DuReader_retrieval, a large-scale Chinese dataset for passage retrieval. DuReader_retrieval contains more than 90K queries and over 8M unique passages from Baidu search. To ensure the quality of our benchmark and address the shortcomings in other existing datasets, we (1) reduce the false negatives in development and testing sets by pooling the results from multiple retrievers with human annotations, (2) and remove the semantically similar questions between training with development and testing sets. We further introduce two extra out-of-domain testing sets for benchmarking the domain generalization capability. Our experiment results demonstrate that DuReader_retrieval is challenging and there is still plenty of room for the community to improve, e.g. the generalization across domains, salient phrase and syntax mismatch between query and paragraph and robustness. DuReader_retrieval will be publicly available at https://github.com/baidu/DuReader/tree/master/DuReader-Retrieval

View paper on

Share this with someone who'll enjoy it:

Title:DuReader_retrieval: A Large-scale Chinese Benchmark for Passage Retrieval from Web Search Engine

Paper and Code