Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tamas Abraham

Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them

Jan 31, 2025

Anh Bui, Trang Vu, Long Vuong, Trung Le, Paul Montague, Tamas Abraham, Junae Kim, Dinh Phung

Abstract:Concept erasure has emerged as a promising technique for mitigating the risk of harmful content generation in diffusion models by selectively unlearning undesirable concepts. The common principle of previous works to remove a specific concept is to map it to a fixed generic concept, such as a neutral concept or just an empty text prompt. In this paper, we demonstrate that this fixed-target strategy is suboptimal, as it fails to account for the impact of erasing one concept on the others. To address this limitation, we model the concept space as a graph and empirically analyze the effects of erasing one concept on the remaining concepts. Our analysis uncovers intriguing geometric properties of the concept space, where the influence of erasing a concept is confined to a local region. Building on this insight, we propose the Adaptive Guided Erasure (AGE) method, which \emph{dynamically} selects optimal target concepts tailored to each undesirable concept, minimizing unintended side effects. Experimental results show that AGE significantly outperforms state-of-the-art erasure methods on preserving unrelated concepts while maintaining effective erasure performance. Our code is published at {https://github.com/tuananhbui89/Adaptive-Guided-Erasure}.

* International Conference on Learning Representations 2025

Via

Access Paper or Ask Questions

Erasing Undesirable Concepts in Diffusion Models with Adversarial Preservation

Oct 21, 2024

Anh Bui, Long Vuong, Khanh Doan, Trung Le, Paul Montague, Tamas Abraham, Dinh Phung

Abstract:Diffusion models excel at generating visually striking content from text but can inadvertently produce undesirable or harmful content when trained on unfiltered internet data. A practical solution is to selectively removing target concepts from the model, but this may impact the remaining concepts. Prior approaches have tried to balance this by introducing a loss term to preserve neutral content or a regularization term to minimize changes in the model parameters, yet resolving this trade-off remains challenging. In this work, we propose to identify and preserving concepts most affected by parameter changes, termed as \textit{adversarial concepts}. This approach ensures stable erasure with minimal impact on the other concepts. We demonstrate the effectiveness of our method using the Stable Diffusion model, showing that it outperforms state-of-the-art erasure methods in eliminating unwanted content while maintaining the integrity of other unrelated elements. Our code is available at \url{https://github.com/tuananhbui89/Erasing-Adversarial-Preservation}.

Via

Access Paper or Ask Questions

EaTVul: ChatGPT-based Evasion Attack Against Software Vulnerability Detection

Jul 27, 2024

Shigang Liu, Di Cao, Junae Kim, Tamas Abraham, Paul Montague, Seyit Camtepe, Jun Zhang, Yang Xiang

Abstract:Recently, deep learning has demonstrated promising results in enhancing the accuracy of vulnerability detection and identifying vulnerabilities in software. However, these techniques are still vulnerable to attacks. Adversarial examples can exploit vulnerabilities within deep neural networks, posing a significant threat to system security. This study showcases the susceptibility of deep learning models to adversarial attacks, which can achieve 100% attack success rate (refer to Table 5). The proposed method, EaTVul, encompasses six stages: identification of important samples using support vector machines, identification of important features using the attention mechanism, generation of adversarial data based on these features using ChatGPT, preparation of an adversarial attack pool, selection of seed data using a fuzzy genetic algorithm, and the execution of an evasion attack. Extensive experiments demonstrate the effectiveness of EaTVul, achieving an attack success rate of more than 83% when the snippet size is greater than 2. Furthermore, in most cases with a snippet size of 4, EaTVul achieves a 100% attack success rate. The findings of this research emphasize the necessity of robust defenses against adversarial attacks in software vulnerability detection.

Via

Access Paper or Ask Questions

Removing Undesirable Concepts in Text-to-Image Generative Models with Learnable Prompts

Mar 18, 2024

Anh Bui, Khanh Doan, Trung Le, Paul Montague, Tamas Abraham, Dinh Phung

Abstract:Generative models have demonstrated remarkable potential in generating visually impressive content from textual descriptions. However, training these models on unfiltered internet data poses the risk of learning and subsequently propagating undesirable concepts, such as copyrighted or unethical content. In this paper, we propose a novel method to remove undesirable concepts from text-to-image generative models by incorporating a learnable prompt into the cross-attention module. This learnable prompt acts as additional memory to transfer the knowledge of undesirable concepts into it and reduce the dependency of these concepts on the model parameters and corresponding textual inputs. Because of this knowledge transfer into the prompt, erasing these undesirable concepts is more stable and has minimal negative impact on other concepts. We demonstrate the effectiveness of our method on the Stable Diffusion model, showcasing its superiority over state-of-the-art erasure methods in terms of removing undesirable content while preserving other unrelated elements.

Via

Access Paper or Ask Questions

Transferable Graph Backdoor Attack

Jul 05, 2022

Shuiqiao Yang, Bao Gia Doan, Paul Montague, Olivier De Vel, Tamas Abraham, Seyit Camtepe, Damith C. Ranasinghe, Salil S. Kanhere

Figure 1 for Transferable Graph Backdoor Attack

Figure 2 for Transferable Graph Backdoor Attack

Figure 3 for Transferable Graph Backdoor Attack

Figure 4 for Transferable Graph Backdoor Attack

Abstract:Graph Neural Networks (GNNs) have achieved tremendous success in many graph mining tasks benefitting from the message passing strategy that fuses the local structure and node features for better graph representation learning. Despite the success of GNNs, and similar to other types of deep neural networks, GNNs are found to be vulnerable to unnoticeable perturbations on both graph structure and node features. Many adversarial attacks have been proposed to disclose the fragility of GNNs under different perturbation strategies to create adversarial examples. However, vulnerability of GNNs to successful backdoor attacks was only shown recently. In this paper, we disclose the TRAP attack, a Transferable GRAPh backdoor attack. The core attack principle is to poison the training dataset with perturbation-based triggers that can lead to an effective and transferable backdoor attack. The perturbation trigger for a graph is generated by performing the perturbation actions on the graph structure via a gradient based score matrix from a surrogate model. Compared with prior works, TRAP attack is different in several ways: i) it exploits a surrogate Graph Convolutional Network (GCN) model to generate perturbation triggers for a blackbox based backdoor attack; ii) it generates sample-specific perturbation triggers which do not have a fixed pattern; and iii) the attack transfers, for the first time in the context of GNNs, to different GNN models when trained with the forged poisoned training dataset. Through extensive evaluations on four real-world datasets, we demonstrate the effectiveness of the TRAP attack to build transferable backdoors in four different popular GNNs using four real-world datasets.

* Accepted by the 25th International Symposium on Research in Attacks, Intrusions, and Defenses

Via

Access Paper or Ask Questions

Towards Understanding Pixel Vulnerability under Adversarial Attacks for Images

Oct 13, 2020

He Zhao, Trung Le, Paul Montague, Olivier De Vel, Tamas Abraham, Dinh Phung

Figure 1 for Towards Understanding Pixel Vulnerability under Adversarial Attacks for Images

Figure 2 for Towards Understanding Pixel Vulnerability under Adversarial Attacks for Images

Figure 3 for Towards Understanding Pixel Vulnerability under Adversarial Attacks for Images

Figure 4 for Towards Understanding Pixel Vulnerability under Adversarial Attacks for Images

Abstract:Deep neural network image classifiers are reported to be susceptible to adversarial evasion attacks, which use carefully crafted images created to mislead a classifier. Recently, various kinds of adversarial attack methods have been proposed, most of which focus on adding small perturbations to all of the pixels of a real image. We find that a considerable amount of the perturbations on an image generated by some widely-used attacks may contribute little in attacking a classifier. However, they usually result in a more easily detectable adversarial image by both humans and adversarial attack detection algorithms. Therefore, it is important to impose the perturbations on the most vulnerable pixels of an image that can change the predictions of classifiers more readily. With the pixel vulnerability, given an existing attack, we can make its adversarial images more realistic and less detectable with fewer perturbations but keep its attack performance the same. Moreover, the discovered vulnerability assists to get a better understanding of the weakness of deep classifiers. Derived from the information-theoretic perspective, we propose a probabilistic approach for automatically finding the pixel vulnerability of an image, which is compatible with and improves over many existing adversarial attacks.

Via

Access Paper or Ask Questions

Improving Ensemble Robustness by Collaboratively Promoting and Demoting Adversarial Robustness

Sep 21, 2020

Anh Bui, Trung Le, He Zhao, Paul Montague, Olivier deVel, Tamas Abraham, Dinh Phung

Figure 1 for Improving Ensemble Robustness by Collaboratively Promoting and Demoting Adversarial Robustness

Figure 2 for Improving Ensemble Robustness by Collaboratively Promoting and Demoting Adversarial Robustness

Figure 3 for Improving Ensemble Robustness by Collaboratively Promoting and Demoting Adversarial Robustness

Figure 4 for Improving Ensemble Robustness by Collaboratively Promoting and Demoting Adversarial Robustness

Abstract:Ensemble-based adversarial training is a principled approach to achieve robustness against adversarial attacks. An important technique of this approach is to control the transferability of adversarial examples among ensemble members. We propose in this work a simple yet effective strategy to collaborate among committee models of an ensemble model. This is achieved via the secure and insecure sets defined for each model member on a given sample, hence help us to quantify and regularize the transferability. Consequently, our proposed framework provides the flexibility to reduce the adversarial transferability as well as to promote the diversity of ensemble members, which are two crucial factors for better robustness in our ensemble approach. We conduct extensive and comprehensive experiments to demonstrate that our proposed method outperforms the state-of-the-art ensemble baselines, at the same time can detect a wide range of adversarial examples with a nearly perfect accuracy.

Via

Access Paper or Ask Questions

Improving Adversarial Robustness by Enforcing Local and Global Compactness

Jul 10, 2020

Anh Bui, Trung Le, He Zhao, Paul Montague, Olivier deVel, Tamas Abraham, Dinh Phung

Figure 1 for Improving Adversarial Robustness by Enforcing Local and Global Compactness

Figure 2 for Improving Adversarial Robustness by Enforcing Local and Global Compactness

Figure 3 for Improving Adversarial Robustness by Enforcing Local and Global Compactness

Figure 4 for Improving Adversarial Robustness by Enforcing Local and Global Compactness

Abstract:The fact that deep neural networks are susceptible to crafted perturbations severely impacts the use of deep learning in certain domains of application. Among many developed defense models against such attacks, adversarial training emerges as the most successful method that consistently resists a wide range of attacks. In this work, based on an observation from a previous study that the representations of a clean data example and its adversarial examples become more divergent in higher layers of a deep neural net, we propose the Adversary Divergence Reduction Network which enforces local/global compactness and the clustering assumption over an intermediate layer of a deep neural network. We conduct comprehensive experiments to understand the isolating behavior of each component (i.e., local/global compactness and the clustering assumption) and compare our proposed model with state-of-the-art adversarial training methods. The experimental results demonstrate that augmenting adversarial training with our proposed components can further improve the robustness of the network, leading to higher unperturbed and adversarial predictive performances.

* Proceeding of the European Conference on Computer Vision (ECCV) 2020

Via

Access Paper or Ask Questions

Perturbations are not Enough: Generating Adversarial Examples with Spatial Distortions

Oct 03, 2019

He Zhao, Trung Le, Paul Montague, Olivier De Vel, Tamas Abraham, Dinh Phung

Figure 1 for Perturbations are not Enough: Generating Adversarial Examples with Spatial Distortions

Figure 2 for Perturbations are not Enough: Generating Adversarial Examples with Spatial Distortions

Figure 3 for Perturbations are not Enough: Generating Adversarial Examples with Spatial Distortions

Figure 4 for Perturbations are not Enough: Generating Adversarial Examples with Spatial Distortions

Abstract:Deep neural network image classifiers are reported to be susceptible to adversarial evasion attacks, which use carefully crafted images created to mislead a classifier. Recently, various kinds of adversarial attack methods have been proposed, most of which focus on adding small perturbations to input images. Despite the success of existing approaches, the way to generate realistic adversarial images with small perturbations remains a challenging problem. In this paper, we aim to address this problem by proposing a novel adversarial method, which generates adversarial examples by imposing not only perturbations but also spatial distortions on input images, including scaling, rotation, shear, and translation. As humans are less susceptible to small spatial distortions, the proposed approach can produce visually more realistic attacks with smaller perturbations, able to deceive classifiers without affecting human predictions. We learn our method by amortized techniques with neural networks and generate adversarial examples efficiently by a forward pass of the networks. Extensive experiments on attacking different types of non-robustified classifiers and robust classifiers with defence show that our method has state-of-the-art performance in comparison with advanced attack parallels.

Via

Access Paper or Ask Questions

Adversarial Reinforcement Learning under Partial Observability in Software-Defined Networking

Feb 25, 2019

Yi Han, David Hubczenko, Paul Montague, Olivier De Vel, Tamas Abraham, Benjamin I. P. Rubinstein, Christopher Leckie, Tansu Alpcan, Sarah Erfani

Figure 1 for Adversarial Reinforcement Learning under Partial Observability in Software-Defined Networking

Figure 2 for Adversarial Reinforcement Learning under Partial Observability in Software-Defined Networking

Figure 3 for Adversarial Reinforcement Learning under Partial Observability in Software-Defined Networking

Figure 4 for Adversarial Reinforcement Learning under Partial Observability in Software-Defined Networking

Abstract:Recent studies have demonstrated that reinforcement learning (RL) agents are susceptible to adversarial manipulation, similar to vulnerabilities previously demonstrated in the supervised setting. Accordingly focus has remained with computer vision, and full observability. This paper focuses on reinforcement learning in the context of autonomous defence in Software-Defined Networking (SDN). We demonstrate that causative attacks---attacks that target the training process---can poison RL agents even if the attacker only has partial observability of the environment. In addition, we propose an inversion defence method that aims to apply the opposite perturbation to that which an attacker might use to generate their adversarial samples. Our experimental results illustrate that the countermeasure can effectively reduce the impact of the causative attack, while not significantly affecting the training process in non-attack scenarios.

* 8 pages, 4 figures

Via

Access Paper or Ask Questions