Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Feiyang Lv

T2Ranking: A large-scale Chinese Benchmark for Passage Ranking

Apr 07, 2023

Xiaohui Xie, Qian Dong, Bingning Wang, Feiyang Lv, Ting Yao, Weinan Gan, Zhijing Wu, Xiangsheng Li, Haitao Li, Yiqun Liu(+1 more)

Figure 1 for T2Ranking: A large-scale Chinese Benchmark for Passage Ranking

Figure 2 for T2Ranking: A large-scale Chinese Benchmark for Passage Ranking

Figure 3 for T2Ranking: A large-scale Chinese Benchmark for Passage Ranking

Figure 4 for T2Ranking: A large-scale Chinese Benchmark for Passage Ranking

Abstract:Passage ranking involves two stages: passage retrieval and passage re-ranking, which are important and challenging topics for both academics and industries in the area of Information Retrieval (IR). However, the commonly-used datasets for passage ranking usually focus on the English language. For non-English scenarios, such as Chinese, the existing datasets are limited in terms of data scale, fine-grained relevance annotation and false negative issues. To address this problem, we introduce T2Ranking, a large-scale Chinese benchmark for passage ranking. T2Ranking comprises more than 300K queries and over 2M unique passages from real-world search engines. Expert annotators are recruited to provide 4-level graded relevance scores (fine-grained) for query-passage pairs instead of binary relevance judgments (coarse-grained). To ease the false negative issues, more passages with higher diversities are considered when performing relevance annotations, especially in the test set, to ensure a more accurate evaluation. Apart from the textual query and passage data, other auxiliary resources are also provided, such as query types and XML files of documents which passages are generated from, to facilitate further studies. To evaluate the dataset, commonly used ranking models are implemented and tested on T2Ranking as baselines. The experimental results show that T2Ranking is challenging and there is still scope for improvement. The full data and all codes are available at https://github.com/THUIR/T2Ranking/

* This Resource paper has been accepted by the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2023)

Via

Access Paper or Ask Questions

ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding

Aug 05, 2022

Bingning Wang, Feiyang Lv, Ting Yao, Yiming Yuan, Jin Ma, Yu Luo, Haijin Liang

Figure 1 for ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding

Figure 2 for ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding

Figure 3 for ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding

Figure 4 for ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding

Abstract:Visual question answering is an important task in both natural language and vision understanding. However, in most of the public visual question answering datasets such as VQA, CLEVR, the questions are human generated that specific to the given image, such as `What color are her eyes?'. The human generated crowdsourcing questions are relatively simple and sometimes have the bias toward certain entities or attributes. In this paper, we introduce a new question answering dataset based on image-ChiQA. It contains the real-world queries issued by internet users, combined with several related open-domain images. The system should determine whether the image could answer the question or not. Different from previous VQA datasets, the questions are real-world image-independent queries that are more various and unbiased. Compared with previous image-retrieval or image-caption datasets, the ChiQA not only measures the relatedness but also measures the answerability, which demands more fine-grained vision and language reasoning. ChiQA contains more than 40K questions and more than 200K question-images pairs. A three-level 2/1/0 label is assigned to each pair indicating perfect answer, partially answer and irrelevant. Data analysis shows ChiQA requires a deep understanding of both language and vision, including grounding, comparisons, and reading. We evaluate several state-of-the-art visual-language models such as ALBEF, demonstrating that there is still a large room for improvements on ChiQA.

* CIKM2022 camera ready version

Via

Access Paper or Ask Questions