Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Patch-Fool: Are Vision Transformers Always Robust Against Adversarial Perturbations?

Apr 05, 2022

Yonggan Fu, Shunyao Zhang, Shang Wu, Cheng Wan, Yingyan Lin

Figure 1 for Patch-Fool: Are Vision Transformers Always Robust Against Adversarial Perturbations?

Figure 2 for Patch-Fool: Are Vision Transformers Always Robust Against Adversarial Perturbations?

Figure 3 for Patch-Fool: Are Vision Transformers Always Robust Against Adversarial Perturbations?

Figure 4 for Patch-Fool: Are Vision Transformers Always Robust Against Adversarial Perturbations?

Share this with someone who'll enjoy it:

Abstract:Vision transformers (ViTs) have recently set off a new wave in neural architecture design thanks to their record-breaking performance in various vision tasks. In parallel, to fulfill the goal of deploying ViTs into real-world vision applications, their robustness against potential malicious attacks has gained increasing attention. In particular, recent works show that ViTs are more robust against adversarial attacks as compared with convolutional neural networks (CNNs), and conjecture that this is because ViTs focus more on capturing global interactions among different input/feature patches, leading to their improved robustness to local perturbations imposed by adversarial attacks. In this work, we ask an intriguing question: "Under what kinds of perturbations do ViTs become more vulnerable learners compared to CNNs?" Driven by this question, we first conduct a comprehensive experiment regarding the robustness of both ViTs and CNNs under various existing adversarial attacks to understand the underlying reason favoring their robustness. Based on the drawn insights, we then propose a dedicated attack framework, dubbed Patch-Fool, that fools the self-attention mechanism by attacking its basic component (i.e., a single patch) with a series of attention-aware optimization techniques. Interestingly, our Patch-Fool framework shows for the first time that ViTs are not necessarily more robust than CNNs against adversarial perturbations. In particular, we find that ViTs are more vulnerable learners compared with CNNs against our Patch-Fool attack which is consistent across extensive experiments, and the observations from Sparse/Mild Patch-Fool, two variants of Patch-Fool, indicate an intriguing insight that the perturbation density and strength on each patch seem to be the key factors that influence the robustness ranking between ViTs and CNNs.

* Accepted at ICLR 2022

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Patch-Fool: Are Vision Transformers Always Robust Against Adversarial Perturbations?

Paper and Code