Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Mitigating Heterogeneity in Federated Multimodal Learning with Biomedical Vision-Language Pre-training

Apr 05, 2024

Zitao Shuai, Liyue Shen

Figure 1 for Mitigating Heterogeneity in Federated Multimodal Learning with Biomedical Vision-Language Pre-training

Figure 2 for Mitigating Heterogeneity in Federated Multimodal Learning with Biomedical Vision-Language Pre-training

Figure 3 for Mitigating Heterogeneity in Federated Multimodal Learning with Biomedical Vision-Language Pre-training

Figure 4 for Mitigating Heterogeneity in Federated Multimodal Learning with Biomedical Vision-Language Pre-training

Share this with someone who'll enjoy it:

Abstract:Vision-language pre-training (VLP) has arised as an efficient scheme for multimodal representation learning, but it requires large-scale multimodal data for pre-training, making it an obstacle especially for biomedical applications. To overcome the data limitation, federated learning (FL) can be a promising strategy to scale up the dataset for biomedical VLP while protecting data privacy. However, client data are often heterogeneous in real-world scenarios, and we observe that local training on heterogeneous client data would distort the multimodal representation learning and lead to biased cross-modal alignment. To address this challenge, we propose Federated distributional Robust Guidance-Based (FedRGB) learning framework for federated VLP with robustness to data heterogeneity. Specifically, we utilize a guidance-based local training scheme to reduce feature distortions, and employ a distribution-based min-max optimization to learn unbiased cross-modal alignment. The experiments on real-world datasets show our method successfully promotes efficient federated multimodal learning for biomedical VLP with data heterogeneity.

View paper on

Share this with someone who'll enjoy it:

Title:Mitigating Heterogeneity in Federated Multimodal Learning with Biomedical Vision-Language Pre-training

Paper and Code