Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach

Feb 10, 2025

Timo Fudala, Vasileios Tsouvalas, Nirvana Meratnia

Figure 1 for Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach

Figure 2 for Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach

Figure 3 for Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach

Figure 4 for Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach

Share this with someone who'll enjoy it:

Abstract:Multimodal transformers integrate diverse data types like images, audio, and text, advancing tasks such as audio-visual understanding and image-text retrieval; yet their high parameterization limits deployment on resource-constrained edge devices. Split Learning (SL), which partitions models at a designated cut-layer to offload compute-intensive operations to the server, offers a promising approach for distributed training of multimodal transformers, though its application remains underexplored. We present MPSL, a parallel SL approach for computational efficient fine-tuning of multimodal transformers in a distributed manner, while eliminating label sharing, client synchronization, and per-client sub-model management. MPSL employs lightweight client-side tokenizers and a unified modality-agnostic encoder, allowing flexible adaptation to task-specific needs. Our evaluation across 7 multimodal datasets demonstrates that MPSL matches or outperforms Federated Learning, reduces client-side computations by 250x, and achieves superior scalability in communication cost with model growth. Through extensive analysis, we highlight task suitability, trade-offs, and scenarios where MPSL excels, inspiring further exploration.

* 10 pages, 4 figures, submitted to IJCAI 2025

View paper on

Share this with someone who'll enjoy it:

Title:Fine-tuning Multimodal Transformers on Edge: A Parallel Split Learning Approach

Paper and Code