Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale

Apr 18, 2024

Xiaotang Gai, Chenyi Zhou, Jiaxiang Liu, Yang Feng, Jian Wu, Zuozhu Liu

Figure 1 for MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale

Figure 2 for MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale

Figure 3 for MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale

Figure 4 for MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale

Share this with someone who'll enjoy it:

Abstract:Medical Visual Question Answering (MedVQA), which offers language responses to image-based medical inquiries, represents a challenging task and significant advancement in healthcare. It assists medical experts to swiftly interpret medical images, thereby enabling faster and more accurate diagnoses. However, the model interpretability and transparency of existing MedVQA solutions are often limited, posing challenges in understanding their decision-making processes. To address this issue, we devise a semi-automated annotation process to streamlining data preparation and build new benchmark MedVQA datasets R-RAD and R-SLAKE. The R-RAD and R-SLAKE datasets provide intermediate medical decision-making rationales generated by multimodal large language models and human annotations for question-answering pairs in existing MedVQA datasets, i.e., VQA-RAD and SLAKE. Moreover, we design a novel framework which finetunes lightweight pretrained generative models by incorporating medical decision-making rationales into the training process. The framework includes three distinct strategies to generate decision outcomes and corresponding rationales, thereby clearly showcasing the medical decision-making process during reasoning. Extensive experiments demonstrate that our method can achieve an accuracy of 83.5% on R-RAD and 86.3% on R-SLAKE, significantly outperforming existing state-of-the-art baselines. Dataset and code will be released.

View paper on

Share this with someone who'll enjoy it:

Title:MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale

Paper and Code