Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiangxu Zhang

AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels

Oct 26, 2024

Lei Li, Xiangxu Zhang, Xiao Zhou, Zheng Liu

Figure 1 for AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels

Figure 2 for AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels

Figure 3 for AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels

Figure 4 for AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels

Abstract:Medical information retrieval (MIR) is essential for retrieving relevant medical knowledge from diverse sources, including electronic health records, scientific literature, and medical databases. However, achieving effective zero-shot dense retrieval in the medical domain poses substantial challenges due to the lack of relevance-labeled data. In this paper, we introduce a novel approach called Self-Learning Hypothetical Document Embeddings (SL-HyDE) to tackle this issue. SL-HyDE leverages large language models (LLMs) as generators to generate hypothetical documents based on a given query. These generated documents encapsulate key medical context, guiding a dense retriever in identifying the most relevant documents. The self-learning framework progressively refines both pseudo-document generation and retrieval, utilizing unlabeled medical corpora without requiring any relevance-labeled data. Additionally, we present the Chinese Medical Information Retrieval Benchmark (CMIRB), a comprehensive evaluation framework grounded in real-world medical scenarios, encompassing five tasks and ten datasets. By benchmarking ten models on CMIRB, we establish a rigorous standard for evaluating medical information retrieval systems. Experimental results demonstrate that SL-HyDE significantly surpasses existing methods in retrieval accuracy while showcasing strong generalization and scalability across various LLM and retriever configurations. CMIRB data and evaluation code are publicly available at: https://github.com/CMIRB-benchmark/CMIRB.

* 15 pages, 3 figures

Via

Access Paper or Ask Questions