Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Erasing Undesirable Concepts in Diffusion Models with Adversarial Preservation

Oct 21, 2024

Anh Bui, Long Vuong, Khanh Doan, Trung Le, Paul Montague, Tamas Abraham, Dinh Phung

Figure 1 for Erasing Undesirable Concepts in Diffusion Models with Adversarial Preservation

Figure 2 for Erasing Undesirable Concepts in Diffusion Models with Adversarial Preservation

Figure 3 for Erasing Undesirable Concepts in Diffusion Models with Adversarial Preservation

Figure 4 for Erasing Undesirable Concepts in Diffusion Models with Adversarial Preservation

Share this with someone who'll enjoy it:

Abstract:Diffusion models excel at generating visually striking content from text but can inadvertently produce undesirable or harmful content when trained on unfiltered internet data. A practical solution is to selectively removing target concepts from the model, but this may impact the remaining concepts. Prior approaches have tried to balance this by introducing a loss term to preserve neutral content or a regularization term to minimize changes in the model parameters, yet resolving this trade-off remains challenging. In this work, we propose to identify and preserving concepts most affected by parameter changes, termed as \textit{adversarial concepts}. This approach ensures stable erasure with minimal impact on the other concepts. We demonstrate the effectiveness of our method using the Stable Diffusion model, showing that it outperforms state-of-the-art erasure methods in eliminating unwanted content while maintaining the integrity of other unrelated elements. Our code is available at \url{https://github.com/tuananhbui89/Erasing-Adversarial-Preservation}.

View paper on

Share this with someone who'll enjoy it:

Title:Erasing Undesirable Concepts in Diffusion Models with Adversarial Preservation

Paper and Code