Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Corentin Royer

DeViDe: Faceted medical knowledge for improved medical vision-language pre-training

Apr 04, 2024

Haozhe Luo, Ziyu Zhou, Corentin Royer, Anjany Sekuboyina, Bjoern Menze

Abstract:Vision-language pre-training for chest X-rays has made significant strides, primarily by utilizing paired radiographs and radiology reports. However, existing approaches often face challenges in encoding medical knowledge effectively. While radiology reports provide insights into the current disease manifestation, medical definitions (as used by contemporary methods) tend to be overly abstract, creating a gap in knowledge. To address this, we propose DeViDe, a novel transformer-based method that leverages radiographic descriptions from the open web. These descriptions outline general visual characteristics of diseases in radiographs, and when combined with abstract definitions and radiology reports, provide a holistic snapshot of knowledge. DeViDe incorporates three key features for knowledge-augmented vision language alignment: First, a large-language model-based augmentation is employed to homogenise medical knowledge from diverse sources. Second, this knowledge is aligned with image information at various levels of granularity. Third, a novel projection layer is proposed to handle the complexity of aligning each image with multiple descriptions arising in a multi-label setting. In zero-shot settings, DeViDe performs comparably to fully supervised models on external datasets and achieves state-of-the-art results on three large-scale datasets. Additionally, fine-tuning DeViDe on four downstream tasks and six segmentation tasks showcases its superior performance across data from diverse distributions.

* arXiv admin note: text overlap with arXiv:2208.04060 by other authors

Via

Access Paper or Ask Questions

MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language Models

Feb 16, 2024

Corentin Royer, Bjoern Menze, Anjany Sekuboyina

Figure 1 for MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language Models

Figure 2 for MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language Models

Figure 3 for MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language Models

Figure 4 for MultiMedEval: A Benchmark and a Toolkit for Evaluating Medical Vision-Language Models

Abstract:We introduce MultiMedEval, an open-source toolkit for fair and reproducible evaluation of large, medical vision-language models (VLM). MultiMedEval comprehensively assesses the models' performance on a broad array of six multi-modal tasks, conducted over 23 datasets, and spanning over 11 medical domains. The chosen tasks and performance metrics are based on their widespread adoption in the community and their diversity, ensuring a thorough evaluation of the model's overall generalizability. We open-source a Python toolkit (github.com/corentin-ryr/MultiMedEval) with a simple interface and setup process, enabling the evaluation of any VLM in just a few lines of code. Our goal is to simplify the intricate landscape of VLM evaluation, thus promoting fair and uniform benchmarking of future models.

* Under review at MIDL 2024

Via

Access Paper or Ask Questions