Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers

Oct 26, 2024

Dehai Min, Zhiyang Xu, Guilin Qi, Lifu Huang, Chenyu You

Figure 1 for UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers

Figure 2 for UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers

Figure 3 for UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers

Figure 4 for UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers

Share this with someone who'll enjoy it:

Abstract:Existing information retrieval (IR) models often assume a homogeneous structure for knowledge sources and user queries, limiting their applicability in real-world settings where retrieval is inherently heterogeneous and diverse. In this paper, we introduce UniHGKR, a unified instruction-aware heterogeneous knowledge retriever that (1) builds a unified retrieval space for heterogeneous knowledge and (2) follows diverse user instructions to retrieve knowledge of specified types. UniHGKR consists of three principal stages: heterogeneous self-supervised pretraining, text-anchored embedding alignment, and instruction-aware retriever fine-tuning, enabling it to generalize across varied retrieval contexts. This framework is highly scalable, with a BERT-based version and a UniHGKR-7B version trained on large language models. Also, we introduce CompMix-IR, the first native heterogeneous knowledge retrieval benchmark. It includes two retrieval scenarios with various instructions, over 9,400 question-answer (QA) pairs, and a corpus of 10 million entries, covering four different types of data. Extensive experiments show that UniHGKR consistently outperforms state-of-the-art methods on CompMix-IR, achieving up to 6.36% and 54.23% relative improvements in two scenarios, respectively. Finally, by equipping our retriever for open-domain heterogeneous QA systems, we achieve a new state-of-the-art result on the popular ConvMix task, with an absolute improvement of up to 4.80 points.

View paper on

Share this with someone who'll enjoy it:

Title:UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers

Paper and Code