Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:SafeGen: Mitigating Unsafe Content Generation in Text-to-Image Models

Apr 10, 2024

Xinfeng Li, Yuchen Yang, Jiangyi Deng, Chen Yan, Yanjiao Chen, Xiaoyu Ji, Wenyuan Xu

Figure 1 for SafeGen: Mitigating Unsafe Content Generation in Text-to-Image Models

Figure 2 for SafeGen: Mitigating Unsafe Content Generation in Text-to-Image Models

Figure 3 for SafeGen: Mitigating Unsafe Content Generation in Text-to-Image Models

Figure 4 for SafeGen: Mitigating Unsafe Content Generation in Text-to-Image Models

Share this with someone who'll enjoy it:

Abstract:Text-to-image (T2I) models, such as Stable Diffusion, have exhibited remarkable performance in generating high-quality images from text descriptions in recent years. However, text-to-image models may be tricked into generating not-safe-for-work (NSFW) content, particularly in sexual scenarios. Existing countermeasures mostly focus on filtering inappropriate inputs and outputs, or suppressing improper text embeddings, which can block explicit NSFW-related content (e.g., naked or sexy) but may still be vulnerable to adversarial prompts inputs that appear innocent but are ill-intended. In this paper, we present SafeGen, a framework to mitigate unsafe content generation by text-to-image models in a text-agnostic manner. The key idea is to eliminate unsafe visual representations from the model regardless of the text input. In this way, the text-to-image model is resistant to adversarial prompts since unsafe visual representations are obstructed from within. Extensive experiments conducted on four datasets demonstrate SafeGen's effectiveness in mitigating unsafe content generation while preserving the high-fidelity of benign images. SafeGen outperforms eight state-of-the-art baseline methods and achieves 99.1% sexual content removal performance. Furthermore, our constructed benchmark of adversarial prompts provides a basis for future development and evaluation of anti-NSFW-generation methods.

View paper on

Share this with someone who'll enjoy it:

Title:SafeGen: Mitigating Unsafe Content Generation in Text-to-Image Models

Paper and Code