Picture for Shreyash Agarwal

Shreyash Agarwal

"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks

Add code
Apr 10, 2022
Figure 1 for "That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks
Figure 2 for "That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks
Figure 3 for "That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks
Figure 4 for "That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks
Viaarxiv icon