Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation

Jun 13, 2024

Lincan Cai, Shuang Li, Wenxuan Ma, Jingxuan Kang, Binhui Xie, Zixun Sun, Chengwei Zhu

Figure 1 for Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation

Figure 2 for Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation

Figure 3 for Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation

Figure 4 for Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation

Share this with someone who'll enjoy it:

Abstract:Large-scale pretrained models have proven immensely valuable in handling data-intensive modalities like text and image. However, fine-tuning these models for certain specialized modalities, such as protein sequence and cosmic ray, poses challenges due to the significant modality discrepancy and scarcity of labeled data. In this paper, we propose an end-to-end method, PaRe, to enhance cross-modal fine-tuning, aiming to transfer a large-scale pretrained model to various target modalities. PaRe employs a gating mechanism to select key patches from both source and target data. Through a modality-agnostic Patch Replacement scheme, these patches are preserved and combined to construct data-rich intermediate modalities ranging from easy to hard. By gradually intermediate modality generation, we can not only effectively bridge the modality gap to enhance stability and transferability of cross-modal fine-tuning, but also address the challenge of limited data in the target modality by leveraging enriched intermediate modality data. Compared with hand-designed, general-purpose, task-specific, and state-of-the-art cross-modal fine-tuning approaches, PaRe demonstrates superior performance across three challenging benchmarks, encompassing more than ten modalities.

View paper on

Share this with someone who'll enjoy it:

Title:Enhancing Cross-Modal Fine-Tuning with Gradually Intermediate Modality Generation

Paper and Code