Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MedXChat: Bridging CXR Modalities with a Unified Multimodal Large Model

Dec 04, 2023

Ling Yang, Zhanyu Wang, Luping Zhou

Figure 1 for MedXChat: Bridging CXR Modalities with a Unified Multimodal Large Model

Figure 2 for MedXChat: Bridging CXR Modalities with a Unified Multimodal Large Model

Figure 3 for MedXChat: Bridging CXR Modalities with a Unified Multimodal Large Model

Figure 4 for MedXChat: Bridging CXR Modalities with a Unified Multimodal Large Model

Share this with someone who'll enjoy it:

Abstract:Despite the success of Large Language Models (LLMs) in general image tasks, a gap persists in the medical field for a multimodal large model adept at handling the nuanced diversity of medical images. Addressing this, we propose MedXChat, a unified multimodal large model designed for seamless interactions between medical assistants and users. MedXChat encompasses three key functionalities: CXR(Chest X-ray)-to-Report generation, CXR-based visual question-answering (VQA), and Text-to-CXR synthesis. Our contributions are as follows. Firstly, our model showcases exceptional cross-task adaptability, displaying adeptness across all three defined tasks and outperforming the benchmark models on the MIMIC dataset in medical multimodal applications. Secondly, we introduce an innovative Text-to-CXR synthesis approach that utilizes instruction-following capabilities within the Stable Diffusion (SD) architecture. This technique integrates smoothly with the existing model framework, requiring no extra parameters, thereby maintaining the SD's generative strength while also bestowing upon it the capacity to render fine-grained medical images with high fidelity. Comprehensive experiments validate MedXChat's synergistic enhancement across all tasks. Our instruction data and model will be open-sourced.

View paper on

Share this with someone who'll enjoy it:

Title:MedXChat: Bridging CXR Modalities with a Unified Multimodal Large Model

Paper and Code