Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:How to Train Vision Transformer on Small-scale Datasets?

Oct 13, 2022

Hanan Gani, Muzammal Naseer, Mohammad Yaqub

Figure 1 for How to Train Vision Transformer on Small-scale Datasets?

Figure 2 for How to Train Vision Transformer on Small-scale Datasets?

Figure 3 for How to Train Vision Transformer on Small-scale Datasets?

Figure 4 for How to Train Vision Transformer on Small-scale Datasets?

Share this with someone who'll enjoy it:

Abstract:Vision Transformer (ViT), a radically different architecture than convolutional neural networks offers multiple advantages including design simplicity, robustness and state-of-the-art performance on many vision tasks. However, in contrast to convolutional neural networks, Vision Transformer lacks inherent inductive biases. Therefore, successful training of such models is mainly attributed to pre-training on large-scale datasets such as ImageNet with 1.2M or JFT with 300M images. This hinders the direct adaption of Vision Transformer for small-scale datasets. In this work, we show that self-supervised inductive biases can be learned directly from small-scale datasets and serve as an effective weight initialization scheme for fine-tuning. This allows to train these models without large-scale pre-training, changes to model architecture or loss functions. We present thorough experiments to successfully train monolithic and non-monolithic Vision Transformers on five small datasets including CIFAR10/100, CINIC10, SVHN, Tiny-ImageNet and two fine-grained datasets: Aircraft and Cars. Our approach consistently improves the performance of Vision Transformers while retaining their properties such as attention to salient regions and higher robustness. Our codes and pre-trained models are available at: https://github.com/hananshafi/vits-for-small-scale-datasets.

* Accepted at BMVC 2022

View paper on

Share this with someone who'll enjoy it:

Title:How to Train Vision Transformer on Small-scale Datasets?

Paper and Code