Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gihyuk Ko

Unsupervised Detection of Adversarial Examples with Model Explanations

Jul 22, 2021

Gihyuk Ko, Gyumin Lim

Figure 1 for Unsupervised Detection of Adversarial Examples with Model Explanations

Figure 2 for Unsupervised Detection of Adversarial Examples with Model Explanations

Figure 3 for Unsupervised Detection of Adversarial Examples with Model Explanations

Figure 4 for Unsupervised Detection of Adversarial Examples with Model Explanations

Abstract:Deep Neural Networks (DNNs) have shown remarkable performance in a diverse range of machine learning applications. However, it is widely known that DNNs are vulnerable to simple adversarial perturbations, which causes the model to incorrectly classify inputs. In this paper, we propose a simple yet effective method to detect adversarial examples, using methods developed to explain the model's behavior. Our key observation is that adding small, humanly imperceptible perturbations can lead to drastic changes in the model explanations, resulting in unusual or irregular forms of explanations. From this insight, we propose an unsupervised detection of adversarial examples using reconstructor networks trained only on model explanations of benign examples. Our evaluations with MNIST handwritten dataset show that our method is capable of detecting adversarial examples generated by the state-of-the-art algorithms with high confidence. To the best of our knowledge, this work is the first in suggesting unsupervised defense method using model explanations.

* AdvML@KDD'21

Via

Access Paper or Ask Questions

Use Privacy in Data-Driven Systems: Theory and Experiments with Machine Learnt Programs

Sep 07, 2017

Anupam Datta, Matthew Fredrikson, Gihyuk Ko, Piotr Mardziel, Shayak Sen

Figure 1 for Use Privacy in Data-Driven Systems: Theory and Experiments with Machine Learnt Programs

Figure 2 for Use Privacy in Data-Driven Systems: Theory and Experiments with Machine Learnt Programs

Figure 3 for Use Privacy in Data-Driven Systems: Theory and Experiments with Machine Learnt Programs

Figure 4 for Use Privacy in Data-Driven Systems: Theory and Experiments with Machine Learnt Programs

Abstract:This paper presents an approach to formalizing and enforcing a class of use privacy properties in data-driven systems. In contrast to prior work, we focus on use restrictions on proxies (i.e. strong predictors) of protected information types. Our definition relates proxy use to intermediate computations that occur in a program, and identify two essential properties that characterize this behavior: 1) its result is strongly associated with the protected information type in question, and 2) it is likely to causally affect the final output of the program. For a specific instantiation of this definition, we present a program analysis technique that detects instances of proxy use in a model, and provides a witness that identifies which parts of the corresponding program exhibit the behavior. Recognizing that not all instances of proxy use of a protected information type are inappropriate, we make use of a normative judgment oracle that makes this inappropriateness determination for a given witness. Our repair algorithm uses the witness of an inappropriate proxy use to transform the model into one that provably does not exhibit proxy use, while avoiding changes that unduly affect classification accuracy. Using a corpus of social datasets, our evaluation shows that these algorithms are able to detect proxy use instances that would be difficult to find using existing techniques, and subsequently remove them while maintaining acceptable classification performance.

* extended CCS 2017 camera-ready: several new discussions, and complexity results added to appendix

Via

Access Paper or Ask Questions

Proxy Non-Discrimination in Data-Driven Systems

Jul 25, 2017

Anupam Datta, Matt Fredrikson, Gihyuk Ko, Piotr Mardziel, Shayak Sen

Figure 1 for Proxy Non-Discrimination in Data-Driven Systems

Figure 2 for Proxy Non-Discrimination in Data-Driven Systems

Figure 3 for Proxy Non-Discrimination in Data-Driven Systems

Figure 4 for Proxy Non-Discrimination in Data-Driven Systems

Abstract:Machine learnt systems inherit biases against protected classes, historically disparaged groups, from training data. Usually, these biases are not explicit, they rely on subtle correlations discovered by training algorithms, and are therefore difficult to detect. We formalize proxy discrimination in data-driven systems, a class of properties indicative of bias, as the presence of protected class correlates that have causal influence on the system's output. We evaluate an implementation on a corpus of social datasets, demonstrating how to validate systems against these properties and to repair violations where they occur.

* arXiv admin note: substantial text overlap with arXiv:1705.07807

Via

Access Paper or Ask Questions