Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

May 24, 2023

Xiaoman Zhang, Chaoyi Wu, Ziheng Zhao, Weixiong Lin, Ya Zhang, Yanfeng Wang, Weidi Xie

Figure 1 for PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

Figure 2 for PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

Figure 3 for PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

Figure 4 for PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

Share this with someone who'll enjoy it:

Abstract:In this paper, we focus on the problem of Medical Visual Question Answering (MedVQA), which is crucial in efficiently interpreting medical images with vital clinic-relevant information. Firstly, we reframe the problem of MedVQA as a generation task that naturally follows the human-machine interaction, we propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model. Secondly, we establish a scalable pipeline to construct a large-scale medical visual question-answering dataset, named PMC-VQA, which contains 227k VQA pairs of 149k images that cover various modalities or diseases. Thirdly, we pre-train our proposed model on PMC-VQA and then fine-tune it on multiple public benchmarks, e.g., VQA-RAD and SLAKE, outperforming existing work by a large margin. Additionally, we propose a test set that has undergone manual verification, which is significantly more challenging, even the best models struggle to solve.

View paper on

Share this with someone who'll enjoy it:

Title:PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

Paper and Code