Abstract:Medical report generation is the task of automatically writing radiology reports for chest X-ray images. Manually composing these reports is a time-consuming process that is also prone to human errors. Generating medical reports can therefore help reduce the burden on radiologists. In other words, we can promote greater clinical automation in the medical domain. In this work, we propose a new framework leveraging vision-enabled Large Language Models (LLM) for the task of medical report generation. We introduce a lightweight solution that achieves better or comparative performance as compared to previous solutions on the task of medical report generation. We conduct extensive experiments exploring different model sizes and enhancement approaches, such as prefix tuning to improve the text generation abilities of the LLMs. We evaluate our approach on a prominent large-scale radiology report dataset - MIMIC-CXR. Our results demonstrate the capability of our resource-efficient framework to generate patient-specific reports with strong medical contextual understanding and high precision.
Abstract:Generating Natural Language Explanations (NLEs) for model predictions on medical images, particularly those depicting thoracic pathologies, remains a critical and challenging task. Existing methodologies often struggle due to general models' insufficient domain-specific medical knowledge and privacy concerns associated with retrieval-based augmentation techniques. To address these issues, we propose a novel Vision-Language framework augmented with a Knowledge Graph (KG)-based datastore, which enhances the model's understanding by incorporating additional domain-specific medical knowledge essential for generating accurate and informative NLEs. Our framework employs a KG-based retrieval mechanism that not only improves the precision of the generated explanations but also preserves data privacy by avoiding direct data retrieval. The KG datastore is designed as a plug-and-play module, allowing for seamless integration with various model architectures. We introduce and evaluate three distinct frameworks within this paradigm: KG-LLaVA, which integrates the pre-trained LLaVA model with KG-RAG; Med-XPT, a custom framework combining MedCLIP, a transformer-based projector, and GPT-2; and Bio-LLaVA, which adapts LLaVA by incorporating the Bio-ViT-L vision model. These frameworks are validated on the MIMIC-NLE dataset, where they achieve state-of-the-art results, underscoring the effectiveness of KG augmentation in generating high-quality NLEs for thoracic pathologies.
Abstract:In this paper, a nonlinear approach to separate different motion types in video data is proposed. This is particularly relevant in dynamic medical imaging (e.g. PET, MRI), where patient motion poses a significant challenge due to its effects on the image reconstruction as well as for its subsequent interpretation. Here, a new method is proposed where dynamic images are represented as the forward mapping of a sequence of latent variables via a generator neural network. The latent variables are structured so that temporal variations in the data are represented via dynamic latent variables, which are independent of static latent variables characterizing the general structure of the frames. In particular, different kinds of motion are also characterized independently of each other via latent space disentanglement using one-dimensional prior information on all but one of the motion types. This representation allows to freeze any selection of motion types, and to obtain accurate independent representations of other dynamics of interest. Moreover, the proposed algorithm is training-free, i.e., all the network parameters are learned directly from a single video. We illustrate the performance of this method on phantom and real-data MRI examples, where we successfully separate respiratory and cardiac motion.