Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Die Zhang

A Unified Game-Theoretic Interpretation of Adversarial Robustness

Nov 08, 2021

Jie Ren, Die Zhang, Yisen Wang, Lu Chen, Zhanpeng Zhou, Yiting Chen, Xu Cheng, Xin Wang, Meng Zhou, Jie Shi(+1 more)

Figure 1 for A Unified Game-Theoretic Interpretation of Adversarial Robustness

Figure 2 for A Unified Game-Theoretic Interpretation of Adversarial Robustness

Figure 3 for A Unified Game-Theoretic Interpretation of Adversarial Robustness

Figure 4 for A Unified Game-Theoretic Interpretation of Adversarial Robustness

Abstract:This paper provides a unified view to explain different adversarial attacks and defense methods, \emph{i.e.} the view of multi-order interactions between input variables of DNNs. Based on the multi-order interaction, we discover that adversarial attacks mainly affect high-order interactions to fool the DNN. Furthermore, we find that the robustness of adversarially trained DNNs comes from category-specific low-order interactions. Our findings provide a potential method to unify adversarial perturbations and robustness, which can explain the existing defense methods in a principle way. Besides, our findings also make a revision of previous inaccurate understanding of the shape bias of adversarially learned features.

* the previous version is arXiv:2103.07364, but I mistakenly apply a new ID for the paper

Via

Access Paper or Ask Questions

Game-theoretic Understanding of Adversarially Learned Features

Mar 12, 2021

Jie Ren, Die Zhang, Yisen Wang, Lu Chen, Zhanpeng Zhou, Xu Cheng, Xin Wang, Yiting Chen, Jie Shi, Quanshi Zhang

Figure 1 for Game-theoretic Understanding of Adversarially Learned Features

Figure 2 for Game-theoretic Understanding of Adversarially Learned Features

Figure 3 for Game-theoretic Understanding of Adversarially Learned Features

Figure 4 for Game-theoretic Understanding of Adversarially Learned Features

Abstract:This paper aims to understand adversarial attacks and defense from a new perspecitve, i.e., the signal-processing behavior of DNNs. We novelly define the multi-order interaction in game theory, which satisfies six properties. With the multi-order interaction, we discover that adversarial attacks mainly affect high-order interactions to fool the DNN. Furthermore, we find that the robustness of adversarially trained DNNs comes from category-specific low-order interactions. Our findings provide more insights into and make a revision of previous understanding for the shape bias of adversarially learned features. Besides, the multi-order interaction can also explain the recoverability of adversarial examples.

Via

Access Paper or Ask Questions

Interpreting Multivariate Interactions in DNNs

Oct 15, 2020

Hao Zhang, Yichen Xie, Longjie Zheng, Die Zhang, Quanshi Zhang

Figure 1 for Interpreting Multivariate Interactions in DNNs

Figure 2 for Interpreting Multivariate Interactions in DNNs

Figure 3 for Interpreting Multivariate Interactions in DNNs

Figure 4 for Interpreting Multivariate Interactions in DNNs

Abstract:This paper aims to explain deep neural networks (DNNs) from the perspective of multivariate interactions. In this paper, we define and quantify the significance of interactions among multiple input variables of the DNN. Input variables with strong interactions usually form a coalition and reflect prototype features, which are memorized and used by the DNN for inference. We define the significance of interactions based on the Shapley value, which is designed to assign the attribution value of each input variable to the inference. We have conducted experiments with various DNNs. Experimental results have demonstrated the effectiveness of the proposed method.

Via

Access Paper or Ask Questions

Interpreting Hierarchical Linguistic Interactions in DNNs

Jun 29, 2020

Die Zhang, Huilin Zhou, Xiaoyi Bao, Da Huo, Ruizhao Chen, Xu Cheng, Hao Zhang, Mengyue Wu, Quanshi Zhang

Figure 1 for Interpreting Hierarchical Linguistic Interactions in DNNs

Figure 2 for Interpreting Hierarchical Linguistic Interactions in DNNs

Figure 3 for Interpreting Hierarchical Linguistic Interactions in DNNs

Figure 4 for Interpreting Hierarchical Linguistic Interactions in DNNs

Abstract:This paper proposes a method to disentangle and quantify interactions among words that are encoded inside a DNN for natural language processing. We construct a tree to encode salient interactions extracted by the DNN. Six metrics are proposed to analyze properties of interactions between constituents in a sentence. The interaction is defined based on Shapley values of words, which are considered as an unbiased estimation of word contributions to the network prediction. Our method is used to quantify word interactions encoded inside the BERT, ELMo, LSTM, CNN, and Transformer networks. Experimental results have provided a new perspective to understand these DNNs, and have demonstrated the effectiveness of our method.

Via

Access Paper or Ask Questions