Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models

Apr 19, 2024

Konstantinos Vilouras, Pedro Sanchez, Alison Q. O'Neil, Sotirios A. Tsaftaris

Figure 1 for Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models

Figure 2 for Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models

Figure 3 for Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models

Figure 4 for Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models

Share this with someone who'll enjoy it:

Abstract:Localizing the exact pathological regions in a given medical scan is an important imaging problem that requires a large amount of bounding box ground truth annotations to be accurately solved. However, there exist alternative, potentially weaker, forms of supervision, such as accompanying free-text reports, which are readily available. The task of performing localization with textual guidance is commonly referred to as phrase grounding. In this work, we use a publicly available Foundation Model, namely the Latent Diffusion Model, to solve this challenging task. This choice is supported by the fact that the Latent Diffusion Model, despite being generative in nature, contains mechanisms (cross-attention) that implicitly align visual and textual features, thus leading to intermediate representations that are suitable for the task at hand. In addition, we aim to perform this task in a zero-shot manner, i.e., without any further training on target data, meaning that the model's weights remain frozen. To this end, we devise strategies to select features and also refine them via post-processing without extra learnable parameters. We compare our proposed method with state-of-the-art approaches which explicitly enforce image-text alignment in a joint embedding space via contrastive learning. Results on a popular chest X-ray benchmark indicate that our method is competitive wih SOTA on different types of pathology, and even outperforms them on average in terms of two metrics (mean IoU and AUC-ROC). Source code will be released upon acceptance.

* 8 pages, 3 figures, submitted to IEEE J-BHI Special Issue on Foundation Models in Medical Imaging

View paper on

Share this with someone who'll enjoy it:

Title:Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models

Paper and Code