Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jieming Zhong

SurrogatePrompt: Bypassing the Safety Filter of Text-To-Image Models via Substitution

Sep 25, 2023

Zhongjie Ba, Jieming Zhong, Jiachen Lei, Peng Cheng, Qinglong Wang, Zhan Qin, Zhibo Wang, Kui Ren

Figure 1 for SurrogatePrompt: Bypassing the Safety Filter of Text-To-Image Models via Substitution

Figure 2 for SurrogatePrompt: Bypassing the Safety Filter of Text-To-Image Models via Substitution

Figure 3 for SurrogatePrompt: Bypassing the Safety Filter of Text-To-Image Models via Substitution

Figure 4 for SurrogatePrompt: Bypassing the Safety Filter of Text-To-Image Models via Substitution

Abstract:Advanced text-to-image models such as DALL-E 2 and Midjourney possess the capacity to generate highly realistic images, raising significant concerns regarding the potential proliferation of unsafe content. This includes adult, violent, or deceptive imagery of political figures. Despite claims of rigorous safety mechanisms implemented in these models to restrict the generation of not-safe-for-work (NSFW) content, we successfully devise and exhibit the first prompt attacks on Midjourney, resulting in the production of abundant photorealistic NSFW images. We reveal the fundamental principles of such prompt attacks and suggest strategically substituting high-risk sections within a suspect prompt to evade closed-source safety measures. Our novel framework, SurrogatePrompt, systematically generates attack prompts, utilizing large language models, image-to-text, and image-to-image modules to automate attack prompt creation at scale. Evaluation results disclose an 88% success rate in bypassing Midjourney's proprietary safety filter with our attack prompts, leading to the generation of counterfeit images depicting political figures in violent scenarios. Both subjective and objective assessments validate that the images generated from our attack prompts present considerable safety hazards.

* 14 pages, 11 figures

Via

Access Paper or Ask Questions

Locate and Verify: A Two-Stream Network for Improved Deepfake Detection

Sep 20, 2023

Chao Shuai, Jieming Zhong, Shuang Wu, Feng Lin, Zhibo Wang, Zhongjie Ba, Zhenguang Liu, Lorenzo Cavallaro, Kui Ren

Figure 1 for Locate and Verify: A Two-Stream Network for Improved Deepfake Detection

Figure 2 for Locate and Verify: A Two-Stream Network for Improved Deepfake Detection

Figure 3 for Locate and Verify: A Two-Stream Network for Improved Deepfake Detection

Figure 4 for Locate and Verify: A Two-Stream Network for Improved Deepfake Detection

Abstract:Deepfake has taken the world by storm, triggering a trust crisis. Current deepfake detection methods are typically inadequate in generalizability, with a tendency to overfit to image contents such as the background, which are frequently occurring but relatively unimportant in the training dataset. Furthermore, current methods heavily rely on a few dominant forgery regions and may ignore other equally important regions, leading to inadequate uncovering of forgery cues. In this paper, we strive to address these shortcomings from three aspects: (1) We propose an innovative two-stream network that effectively enlarges the potential regions from which the model extracts forgery evidence. (2) We devise three functional modules to handle the multi-stream and multi-scale features in a collaborative learning scheme. (3) Confronted with the challenge of obtaining forgery annotations, we propose a Semi-supervised Patch Similarity Learning strategy to estimate patch-level forged location annotations. Empirically, our method demonstrates significantly improved robustness and generalizability, outperforming previous methods on six benchmarks, and improving the frame-level AUC on Deepfake Detection Challenge preview dataset from 0.797 to 0.835 and video-level AUC on CelebDF$\_$v1 dataset from 0.811 to 0.847. Our implementation is available at https://github.com/sccsok/Locate-and-Verify.

* 10 pages, 8 figures, 60 references. This paper has been accepted for ACM MM 2023

Via

Access Paper or Ask Questions