Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Oscar Chew

Defending Text-to-image Diffusion Models: Surprising Efficacy of Textual Perturbations Against Backdoor Attacks

Aug 28, 2024

Oscar Chew, Po-Yi Lu, Jayden Lin, Hsuan-Tien Lin

Abstract:Text-to-image diffusion models have been widely adopted in real-world applications due to their ability to generate realistic images from textual descriptions. However, recent studies have shown that these methods are vulnerable to backdoor attacks. Despite the significant threat posed by backdoor attacks on text-to-image diffusion models, countermeasures remain under-explored. In this paper, we address this research gap by demonstrating that state-of-the-art backdoor attacks against text-to-image diffusion models can be effectively mitigated by a surprisingly simple defense strategy - textual perturbation. Experiments show that textual perturbations are effective in defending against state-of-the-art backdoor attacks with minimal sacrifice to generation quality. We analyze the efficacy of textual perturbation from two angles: text embedding space and cross-attention maps. They further explain how backdoor attacks have compromised text-to-image diffusion models, providing insights for studying future attack and defense strategies. Our code is available at https://github.com/oscarchew/t2i-backdoor-defense.

* ECCV 2024 Workshop The Dark Side of Generative AIs and Beyond

Via

Access Paper or Ask Questions

Understanding and Mitigating Spurious Correlations in Text Classification

May 23, 2023

Oscar Chew, Kuan-Hao Huang, Kai-Wei Chang, Hsuan-Tien Lin

Figure 1 for Understanding and Mitigating Spurious Correlations in Text Classification

Figure 2 for Understanding and Mitigating Spurious Correlations in Text Classification

Figure 3 for Understanding and Mitigating Spurious Correlations in Text Classification

Figure 4 for Understanding and Mitigating Spurious Correlations in Text Classification

Abstract:Recent work has shown that deep learning models are prone to exploit spurious correlations that are present in the training set, yet may not hold true in general. A sentiment classifier may erroneously learn that the token spielberg is always tied to positive movie reviews. Relying on spurious correlations may lead to significant degradation in generalizability and should be avoided. In this paper, we propose a neighborhood analysis framework to explain how exactly language models exploit spurious correlations. Driven by the analysis, we propose a family of regularization methods, NFL (do Not Forget your Language) to prevent the situation. Experiments on two text classification tasks show that NFL brings a significant improvement over standard fine-tuning in terms of robustness without sacrificing in-distribution accuracy.

Via

Access Paper or Ask Questions