Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Md. Rakibul Islam

Jana

Privacy-Preserving Chest X-ray Report Generation via Multimodal Federated Learning with ViT and GPT-2

May 27, 2025

Md. Zahid Hossain, Mustofa Ahmed, Most. Sharmin Sultana Samu, Md. Rakibul Islam

Abstract:The automated generation of radiology reports from chest X-ray images holds significant promise in enhancing diagnostic workflows while preserving patient privacy. Traditional centralized approaches often require sensitive data transfer, posing privacy concerns. To address this, the study proposes a Multimodal Federated Learning framework for chest X-ray report generation using the IU-Xray dataset. The system utilizes a Vision Transformer (ViT) as the encoder and GPT-2 as the report generator, enabling decentralized training without sharing raw data. Three Federated Learning (FL) aggregation strategies: FedAvg, Krum Aggregation and a novel Loss-aware Federated Averaging (L-FedAvg) were evaluated. Among these, Krum Aggregation demonstrated superior performance across lexical and semantic evaluation metrics such as ROUGE, BLEU, BERTScore and RaTEScore. The results show that FL can match or surpass centralized models in generating clinically relevant and semantically rich radiology reports. This lightweight and privacy-preserving framework paves the way for collaborative medical AI development without compromising data confidentiality.

* Preprint, manuscript under-review

Via

Access Paper or Ask Questions

Vision-Language Models for Automated Chest X-ray Interpretation: Leveraging ViT and GPT-2

Jan 21, 2025

Md. Rakibul Islam, Md. Zahid Hossain, Mustofa Ahmed, Most. Sharmin Sultana Samu

Figure 1 for Vision-Language Models for Automated Chest X-ray Interpretation: Leveraging ViT and GPT-2

Figure 2 for Vision-Language Models for Automated Chest X-ray Interpretation: Leveraging ViT and GPT-2

Figure 3 for Vision-Language Models for Automated Chest X-ray Interpretation: Leveraging ViT and GPT-2

Figure 4 for Vision-Language Models for Automated Chest X-ray Interpretation: Leveraging ViT and GPT-2

Abstract:Radiology plays a pivotal role in modern medicine due to its non-invasive diagnostic capabilities. However, the manual generation of unstructured medical reports is time consuming and prone to errors. It creates a significant bottleneck in clinical workflows. Despite advancements in AI-generated radiology reports, challenges remain in achieving detailed and accurate report generation. In this study we have evaluated different combinations of multimodal models that integrate Computer Vision and Natural Language Processing to generate comprehensive radiology reports. We employed a pretrained Vision Transformer (ViT-B16) and a SWIN Transformer as the image encoders. The BART and GPT-2 models serve as the textual decoders. We used Chest X-ray images and reports from the IU-Xray dataset to evaluate the usability of the SWIN Transformer-BART, SWIN Transformer-GPT-2, ViT-B16-BART and ViT-B16-GPT-2 models for report generation. We aimed at finding the best combination among the models. The SWIN-BART model performs as the best-performing model among the four models achieving remarkable results in almost all the evaluation metrics like ROUGE, BLEU and BERTScore.

* Preprint, manuscript under-review

Via

Access Paper or Ask Questions

Preference-Guided Planning: An Active Elicitation Approach

Apr 19, 2018

Mayukh Das, Phillip Odom, Md. Rakibul Islam, Janardhan Rao, Doppa, Dan Roth, Sriraam Natarajan

Figure 1 for Preference-Guided Planning: An Active Elicitation Approach

Figure 2 for Preference-Guided Planning: An Active Elicitation Approach

Figure 3 for Preference-Guided Planning: An Active Elicitation Approach

Figure 4 for Preference-Guided Planning: An Active Elicitation Approach

Abstract:Planning with preferences has been employed extensively to quickly generate high-quality plans. However, it may be difficult for the human expert to supply this information without knowledge of the reasoning employed by the planner and the distribution of planning problems. We consider the problem of actively eliciting preferences from a human expert during the planning process. Specifically, we study this problem in the context of the Hierarchical Task Network (HTN) planning framework as it allows easy interaction with the human. Our experimental results on several diverse planning domains show that the preferences gathered using the proposed approach improve the quality and speed of the planner, while reducing the burden on the human expert.

* Under Review at Knowledge-Based Systems (Elsevier); "Extended Abstract" accepted and to appear at AAMAS 2018

Via

Access Paper or Ask Questions