Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves

Dec 16, 2024

Shihan Wu, Ji Zhang, Pengpeng Zeng, Lianli Gao, Jingkuan Song, Heng Tao Shen

Figure 1 for Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves

Figure 2 for Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves

Figure 3 for Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves

Figure 4 for Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves

Share this with someone who'll enjoy it:

Abstract:Prompt tuning (PT) has long been recognized as an effective and efficient paradigm for transferring large pre-trained vision-language models (VLMs) to downstream tasks by learning a tiny set of context vectors. Nevertheless, in this work, we reveal that freezing the parameters of VLMs during learning the context vectors neither facilitates the transferability of pre-trained knowledge nor improves the memory and time efficiency significantly. Upon further investigation, we find that reducing both the length and width of the feature-gradient propagation flows of the full fine-tuning (FT) baseline is key to achieving effective and efficient knowledge transfer. Motivated by this, we propose Skip Tuning, a novel paradigm for adapting VLMs to downstream tasks. Unlike existing PT or adapter-based methods, Skip Tuning applies Layer-wise Skipping (LSkip) and Class-wise Skipping (CSkip) upon the FT baseline without introducing extra context vectors or adapter modules. Extensive experiments across a wide spectrum of benchmarks demonstrate the superior effectiveness and efficiency of our Skip Tuning over both PT and adapter-based methods. Code: https://github.com/Koorye/SkipTuning.

View paper on

Share this with someone who'll enjoy it:

Title:Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves

Paper and Code