Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tor Johansen

Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers

Apr 17, 2024

Diana-Nicoleta Grigore, Mariana-Iuliana Georgescu, Jon Alvarez Justo, Tor Johansen, Andreea Iuliana Ionescu, Radu Tudor Ionescu

Figure 1 for Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers

Figure 2 for Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers

Figure 3 for Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers

Figure 4 for Weight Copy and Low-Rank Adaptation for Few-Shot Distillation of Vision Transformers

Abstract:Few-shot knowledge distillation recently emerged as a viable approach to harness the knowledge of large-scale pre-trained models, using limited data and computational resources. In this paper, we propose a novel few-shot feature distillation approach for vision transformers. Our approach is based on two key steps. Leveraging the fact that vision transformers have a consistent depth-wise structure, we first copy the weights from intermittent layers of existing pre-trained vision transformers (teachers) into shallower architectures (students), where the intermittence factor controls the complexity of the student transformer with respect to its teacher. Next, we employ an enhanced version of Low-Rank Adaptation (LoRA) to distill knowledge into the student in a few-shot scenario, aiming to recover the information processing carried out by the skipped teacher layers. We present comprehensive experiments with supervised and self-supervised transformers as teachers, on five data sets from various domains, including natural, medical and satellite images. The empirical results confirm the superiority of our approach over competitive baselines. Moreover, the ablation results demonstrate the usefulness of each component of the proposed pipeline.

Via

Access Paper or Ask Questions