Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Identifying a Training-Set Attack's Target Using Renormalized Influence Estimation

Jan 25, 2022

Zayd Hammoudeh, Daniel Lowd

Figure 1 for Identifying a Training-Set Attack's Target Using Renormalized Influence Estimation

Figure 2 for Identifying a Training-Set Attack's Target Using Renormalized Influence Estimation

Figure 3 for Identifying a Training-Set Attack's Target Using Renormalized Influence Estimation

Figure 4 for Identifying a Training-Set Attack's Target Using Renormalized Influence Estimation

Share this with someone who'll enjoy it:

Abstract:Targeted training-set attacks inject malicious instances into the training set to cause a trained model to mislabel one or more specific test instances. This work proposes the task of target identification, which determines whether a specific test instance is the target of a training-set attack. This can then be combined with adversarial-instance identification to find (and remove) the attack instances, mitigating the attack with minimal impact on other predictions. Rather than focusing on a single attack method or data modality, we build on influence estimation, which quantifies each training instance's contribution to a model's prediction. We show that existing influence estimators' poor practical performance often derives from their over-reliance on instances and iterations with large losses. Our renormalized influence estimators fix this weakness; they far outperform the original ones at identifying influential groups of training examples in both adversarial and non-adversarial settings, even finding up to 100% of adversarial training instances with no clean-data false positives. Target identification then simplifies to detecting test instances with anomalous influence values. We demonstrate our method's generality on backdoor and poisoning attacks across various data domains including text, vision, and speech. Our source code is available at https://github.com/ZaydH/target_identification .

View paper on

Share this with someone who'll enjoy it:

Title:Identifying a Training-Set Attack's Target Using Renormalized Influence Estimation

Paper and Code