Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Anatomy-Aware Conditional Image-Text Retrieval

Mar 10, 2025

Meng Zheng, Jiajin Zhang, Benjamin Planche, Zhongpai Gao, Terrence Chen, Ziyan Wu

Figure 1 for Anatomy-Aware Conditional Image-Text Retrieval

Figure 2 for Anatomy-Aware Conditional Image-Text Retrieval

Figure 3 for Anatomy-Aware Conditional Image-Text Retrieval

Figure 4 for Anatomy-Aware Conditional Image-Text Retrieval

Share this with someone who'll enjoy it:

Abstract:Image-Text Retrieval (ITR) finds broad applications in healthcare, aiding clinicians and radiologists by automatically retrieving relevant patient cases in the database given the query image and/or report, for more efficient clinical diagnosis and treatment, especially for rare diseases. However conventional ITR systems typically only rely on global image or text representations for measuring patient image/report similarities, which overlook local distinctiveness across patient cases. This often results in suboptimal retrieval performance. In this paper, we propose an Anatomical Location-Conditioned Image-Text Retrieval (ALC-ITR) framework, which, given a query image and the associated suspicious anatomical region(s), aims to retrieve similar patient cases exhibiting the same disease or symptoms in the same anatomical region. To perform location-conditioned multimodal retrieval, we learn a medical Relevance-Region-Aligned Vision Language (RRA-VL) model with semantic global-level and region-/word-level alignment to produce generalizable, well-aligned multi-modal representations. Additionally, we perform location-conditioned contrastive learning to further utilize cross-pair region-level contrastiveness for improved multi-modal retrieval. We show that our proposed RRA-VL achieves state-of-the-art localization performance in phase-grounding tasks, and satisfying multi-modal retrieval performance with or without location conditioning. Finally, we thoroughly investigate the generalizability and explainability of our proposed ALC-ITR system in providing explanations and preliminary diagnosis reports given retrieved patient cases (conditioned on anatomical regions), with proper off-the-shelf LLM prompts.

* 16 pages, 10 figures

View paper on

Share this with someone who'll enjoy it:

Title:Anatomy-Aware Conditional Image-Text Retrieval

Paper and Code