Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kian Shamsaie

Scanning Trojaned Models Using Out-of-Distribution Samples

Jan 28, 2025

Hossein Mirzaei, Ali Ansari, Bahar Dibaei Nia, Mojtaba Nafez, Moein Madadi, Sepehr Rezaee, Zeinab Sadat Taghavi, Arad Maleki, Kian Shamsaie, Mahdi Hajialilue(+3 more)

Figure 1 for Scanning Trojaned Models Using Out-of-Distribution Samples

Figure 2 for Scanning Trojaned Models Using Out-of-Distribution Samples

Figure 3 for Scanning Trojaned Models Using Out-of-Distribution Samples

Figure 4 for Scanning Trojaned Models Using Out-of-Distribution Samples

Abstract:Scanning for trojan (backdoor) in deep neural networks is crucial due to their significant real-world applications. There has been an increasing focus on developing effective general trojan scanning methods across various trojan attacks. Despite advancements, there remains a shortage of methods that perform effectively without preconceived assumptions about the backdoor attack method. Additionally, we have observed that current methods struggle to identify classifiers trojaned using adversarial training. Motivated by these challenges, our study introduces a novel scanning method named TRODO (TROjan scanning by Detection of adversarial shifts in Out-of-distribution samples). TRODO leverages the concept of "blind spots"--regions where trojaned classifiers erroneously identify out-of-distribution (OOD) samples as in-distribution (ID). We scan for these blind spots by adversarially shifting OOD samples towards in-distribution. The increased likelihood of perturbed OOD samples being classified as ID serves as a signature for trojan detection. TRODO is both trojan and label mapping agnostic, effective even against adversarially trained trojaned classifiers. It is applicable even in scenarios where training data is absent, demonstrating high accuracy and adaptability across various scenarios and datasets, highlighting its potential as a robust trojan scanning strategy.

* Accepted at the Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS) 2024. The code repository is available at: https://github.com/rohban-lab/TRODO

Via

Access Paper or Ask Questions

A Contrastive Teacher-Student Framework for Novelty Detection under Style Shifts

Jan 28, 2025

Hossein Mirzaei, Mojtaba Nafez, Moein Madadi, Arad Maleki, Mahdi Hajialilue, Zeinab Sadat Taghavi, Sepehr Rezaee, Ali Ansari, Bahar Dibaei Nia, Kian Shamsaie(+5 more)

Figure 1 for A Contrastive Teacher-Student Framework for Novelty Detection under Style Shifts

Figure 2 for A Contrastive Teacher-Student Framework for Novelty Detection under Style Shifts

Figure 3 for A Contrastive Teacher-Student Framework for Novelty Detection under Style Shifts

Figure 4 for A Contrastive Teacher-Student Framework for Novelty Detection under Style Shifts

Abstract:There have been several efforts to improve Novelty Detection (ND) performance. However, ND methods often suffer significant performance drops under minor distribution shifts caused by changes in the environment, known as style shifts. This challenge arises from the ND setup, where the absence of out-of-distribution (OOD) samples during training causes the detector to be biased toward the dominant style features in the in-distribution (ID) data. As a result, the model mistakenly learns to correlate style with core features, using this shortcut for detection. Robust ND is crucial for real-world applications like autonomous driving and medical imaging, where test samples may have different styles than the training data. Motivated by this, we propose a robust ND method that crafts an auxiliary OOD set with style features similar to the ID set but with different core features. Then, a task-based knowledge distillation strategy is utilized to distinguish core features from style features and help our model rely on core features for discriminating crafted OOD and ID sets. We verified the effectiveness of our method through extensive experimental evaluations on several datasets, including synthetic and real-world benchmarks, against nine different ND methods.

* The code repository is available at: https://github.com/rohban-lab/CTS

Via

Access Paper or Ask Questions