Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Azer Khan

Output Randomization: A Novel Defense for both White-box and Black-box Adversarial Models

Jul 08, 2021

Daniel Park, Haidar Khan, Azer Khan, Alex Gittens, Bülent Yener

Figure 1 for Output Randomization: A Novel Defense for both White-box and Black-box Adversarial Models

Figure 2 for Output Randomization: A Novel Defense for both White-box and Black-box Adversarial Models

Figure 3 for Output Randomization: A Novel Defense for both White-box and Black-box Adversarial Models

Figure 4 for Output Randomization: A Novel Defense for both White-box and Black-box Adversarial Models

Abstract:Adversarial examples pose a threat to deep neural network models in a variety of scenarios, from settings where the adversary has complete knowledge of the model in a "white box" setting and to the opposite in a "black box" setting. In this paper, we explore the use of output randomization as a defense against attacks in both the black box and white box models and propose two defenses. In the first defense, we propose output randomization at test time to thwart finite difference attacks in black box settings. Since this type of attack relies on repeated queries to the model to estimate gradients, we investigate the use of randomization to thwart such adversaries from successfully creating adversarial examples. We empirically show that this defense can limit the success rate of a black box adversary using the Zeroth Order Optimization attack to 0%. Secondly, we propose output randomization training as a defense against white box adversaries. Unlike prior approaches that use randomization, our defense does not require its use at test time, eliminating the Backward Pass Differentiable Approximation attack, which was shown to be effective against other randomization defenses. Additionally, this defense has low overhead and is easily implemented, allowing it to be used together with other defenses across various model architectures. We evaluate output randomization training against the Projected Gradient Descent attacker and show that the defense can reduce the PGD attack's success rate down to 12% when using cross-entropy loss.

* This is a substantially changed version of an earlier preprint (arXiv:1905.09871)

Via

Access Paper or Ask Questions

Thwarting finite difference adversarial attacks with output randomization

May 23, 2019

Haidar Khan, Daniel Park, Azer Khan, Bülent Yener

Figure 1 for Thwarting finite difference adversarial attacks with output randomization

Figure 2 for Thwarting finite difference adversarial attacks with output randomization

Figure 3 for Thwarting finite difference adversarial attacks with output randomization

Figure 4 for Thwarting finite difference adversarial attacks with output randomization

Abstract:Adversarial examples pose a threat to deep neural network models in a variety of scenarios, from settings where the adversary has complete knowledge of the model and to the opposite "black box" setting. Black box attacks are particularly threatening as the adversary only needs access to the input and output of the model. Defending against black box adversarial example generation attacks is paramount as currently proposed defenses are not effective. Since these types of attacks rely on repeated queries to the model to estimate gradients over input dimensions, we investigate the use of randomization to thwart such adversaries from successfully creating adversarial examples. Randomization applied to the output of the deep neural network model has the potential to confuse potential attackers, however this introduces a tradeoff between accuracy and robustness. We show that for certain types of randomization, we can bound the probability of introducing errors by carefully setting distributional parameters. For the particular case of finite difference black box attacks, we quantify the error introduced by the defense in the finite difference estimate of the gradient. Lastly, we show empirically that the defense can thwart two adaptive black box adversarial attack algorithms.

Via

Access Paper or Ask Questions