Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Prior Knowledge-Guided Attention in Self-Supervised Vision Transformers

Sep 07, 2022

Kevin Miao, Akash Gokul, Raghav Singh, Suzanne Petryk, Joseph Gonzalez, Kurt Keutzer, Trevor Darrell, Colorado Reed

Figure 1 for Prior Knowledge-Guided Attention in Self-Supervised Vision Transformers

Figure 2 for Prior Knowledge-Guided Attention in Self-Supervised Vision Transformers

Figure 3 for Prior Knowledge-Guided Attention in Self-Supervised Vision Transformers

Figure 4 for Prior Knowledge-Guided Attention in Self-Supervised Vision Transformers

Share this with someone who'll enjoy it:

Abstract:Recent trends in self-supervised representation learning have focused on removing inductive biases from training pipelines. However, inductive biases can be useful in settings when limited data are available or provide additional insight into the underlying data distribution. We present spatial prior attention (SPAN), a framework that takes advantage of consistent spatial and semantic structure in unlabeled image datasets to guide Vision Transformer attention. SPAN operates by regularizing attention masks from separate transformer heads to follow various priors over semantic regions. These priors can be derived from data statistics or a single labeled sample provided by a domain expert. We study SPAN through several detailed real-world scenarios, including medical image analysis and visual quality assurance. We find that the resulting attention masks are more interpretable than those derived from domain-agnostic pretraining. SPAN produces a 58.7 mAP improvement for lung and heart segmentation. We also find that our method yields a 2.2 mAUC improvement compared to domain-agnostic pretraining when transferring the pretrained model to a downstream chest disease classification task. Lastly, we show that SPAN pretraining leads to higher downstream classification performance in low-data regimes compared to domain-agnostic pretraining.

View paper on

Share this with someone who'll enjoy it:

Title:Prior Knowledge-Guided Attention in Self-Supervised Vision Transformers

Paper and Code