Abstract:Graph Convolutional Networks (GCNs) have shown excellent performance in dealing with various graph structures such as node classification, graph classification and other tasks. However,recent studies have shown that GCNs are vulnerable to a novel threat known as backdoor attacks. However, all existing backdoor attacks in the graph domain require modifying the training samples to accomplish the backdoor injection, which may not be practical in many realistic scenarios where adversaries have no access to modify the training samples and may leads to the backdoor attack being detected easily. In order to explore the backdoor vulnerability of GCNs and create a more practical and stealthy backdoor attack method, this paper proposes a clean-graph backdoor attack against GCNs (CBAG) in the node classification task,which only poisons the training labels without any modification to the training samples, revealing that GCNs have this security vulnerability. Specifically, CBAG designs a new trigger exploration method to find important feature dimensions as the trigger patterns to improve the attack performance. By poisoning the training labels, a hidden backdoor is injected into the GCNs model. Experimental results show that our clean graph backdoor can achieve 99% attack success rate while maintaining the functionality of the GCNs model on benign samples.
Abstract:Graph Neural Networks (GNNs) are a class of deep learning models capable of processing graph-structured data, and they have demonstrated significant performance in a variety of real-world applications. Recent studies have found that GNN models are vulnerable to backdoor attacks. When specific patterns (called backdoor triggers, e.g., subgraphs, nodes, etc.) appear in the input data, the backdoor embedded in the GNN models is activated, which misclassifies the input data into the target class label specified by the attacker, whereas when there are no backdoor triggers in the input, the backdoor embedded in the GNN models is not activated, and the models work normally. Backdoor attacks are highly stealthy and expose GNN models to serious security risks. Currently, research on backdoor attacks against GNNs mainly focus on tasks such as graph classification and node classification, and backdoor attacks against link prediction tasks are rarely studied. In this paper, we propose a backdoor attack against the link prediction tasks based on GNNs and reveal the existence of such security vulnerability in GNN models, which make the backdoored GNN models to incorrectly predict unlinked two nodes as having a link relationship when a trigger appear. The method uses a single node as the trigger and poison selected node pairs in the training graph, and then the backdoor will be embedded in the GNN models through the training process. In the inference stage, the backdoor in the GNN models can be activated by simply linking the trigger node to the two end nodes of the unlinked node pairs in the input data, causing the GNN models to produce incorrect link prediction results for the target node pairs.
Abstract:Graph Convolutional Networks (GCNs) have been very effective in addressing the issue of various graph-structured related tasks, such as node classification and graph classification. However, extensive research has shown that GCNs are vulnerable to adversarial attacks. One of the security threats facing GCNs is the backdoor attack, which hides incorrect classification rules in models and activates only when the model encounters specific inputs containing special features (e.g., fixed patterns like subgraphs, called triggers), thus outputting incorrect classification results, while the model behaves normally on benign samples. The semantic backdoor attack is a type of the backdoor attack where the trigger is a semantic part of the sample; i.e., the trigger exists naturally in the original dataset and the attacker can pick a naturally occurring feature as the backdoor trigger, which causes the model to misclassify even unmodified inputs. Meanwhile, it is difficult to detect even if the attacker modifies the input samples in the inference phase as they do not have any anomaly compared to normal samples. Thus, semantic backdoor attacks are more imperceptible than non-semantic ones. However, existed research on semantic backdoor attacks has only focused on image and text domains, which have not been well explored against GCNs. In this work, we propose a black-box Semantic Backdoor Attack (SBA) against GCNs. We assign the trigger as a certain class of nodes in the dataset and our trigger is semantic. Through evaluation on several real-world benchmark graph datasets, the experimental results demonstrate that our proposed SBA can achieve almost 100% attack success rate under the poisoning rate less than 5% while having no impact on normal predictive accuracy.
Abstract:Capsule networks (CapsNets) are new neural networks that classify images based on the spatial relationships of features. By analyzing the pose of features and their relative positions, it is more capable to recognize images after affine transformation. The stacked capsule autoencoder (SCAE) is a state-of-the-art CapsNet, and achieved unsupervised classification of CapsNets for the first time. However, the security vulnerabilities and the robustness of the SCAE has rarely been explored. In this paper, we propose an evasion attack against SCAE, where the attacker can generate adversarial perturbations based on reducing the contribution of the object capsules in SCAE related to the original category of the image. The adversarial perturbations are then applied to the original images, and the perturbed images will be misclassified. Furthermore, we propose a defense method called Hybrid Adversarial Training (HAT) against such evasion attacks. HAT makes use of adversarial training and adversarial distillation to achieve better robustness and stability. We evaluate the defense method and the experimental results show that the refined SCAE model can achieve 82.14% classification accuracy under evasion attack. The source code is available at https://github.com/FrostbiteXSW/SCAE_Defense.
Abstract:Graph-structured data exist in numerous applications in real life. As a state-of-the-art graph neural network, the graph convolutional network (GCN) plays an important role in processing graph-structured data. However, a recent study reported that GCNs are also vulnerable to adversarial attacks, which means that GCN models may suffer malicious attacks with unnoticeable modifications of the data. Among all the adversarial attacks on GCNs, there is a special kind of attack method called the universal adversarial attack, which generates a perturbation that can be applied to any sample and causes GCN models to output incorrect results. Although universal adversarial attacks in computer vision have been extensively researched, there are few research works on universal adversarial attacks on graph structured data. In this paper, we propose a targeted universal adversarial attack against GCNs. Our method employs a few nodes as the attack nodes. The attack capability of the attack nodes is enhanced through a small number of fake nodes connected to them. During an attack, any victim node will be misclassified by the GCN as the attack node class as long as it is linked to them. The experiments on three popular datasets show that the average attack success rate of the proposed attack on any victim node in the graph reaches 83% when using only 3 attack nodes and 6 fake nodes. We hope that our work will make the community aware of the threat of this type of attack and raise the attention given to its future defense.
Abstract:Capsule network is a kind of neural network which uses spatial relationship between features to classify images. By capturing poses and relative positions between features, its ability to recognize affine transformation is improved and surpasses traditional convolutional neural networks (CNNs) when dealing with translation, rotation and scaling. Stacked Capsule Autoencoder (SCAE) is the state-of-the-art generation of capsule network. SCAE encodes the image as capsules, each of which contains poses of features and their correlations. The encoded contents are then input into downstream classifier to predict the categories of the images. Existed research mainly focuses on security of capsule networks with dynamic routing or EM routing, little attention has been paid to the security and robustness of SCAE. In this paper, we propose an evasion attack against SCAE. After perturbation is generated with an optimization algorithm, it is added to an image to reduce the output of capsules related to the original category of the image. As the contribution of these capsules to the original class is reduced, the perturbed image will be misclassified. We evaluate the attack with image classification experiment on the MNIST dataset. The experimental results indicate that our attack can achieve around 99% success rate.
Abstract:Convolutional neural networks (CNN) have become one of the most popular machine learning tools and are being applied in various tasks, however, CNN models are vulnerable to universal perturbations, which are usually human-imperceptible but can cause natural images to be misclassified with high probability. One of the state-of-the-art algorithms to generate universal perturbations is known as UAP. UAP only aggregates the minimal perturbations in every iteration, which will lead to generated universal perturbation whose magnitude cannot rise up efficiently and cause a slow generation. In this paper, we proposed an optimized algorithm to improve the performance of crafting universal perturbations based on orientation of perturbation vectors. At each iteration, instead of choosing minimal perturbation vector with respect to each image, we aggregate the current instance of universal perturbation with the perturbation which has similar orientation to the former so that the magnitude of the aggregation will rise up as large as possible at every iteration. The experiment results show that we get universal perturbations in a shorter time and with a smaller number of training images. Furthermore, we observe in experiments that universal perturbations generated by our proposed algorithm have an average increment of fooling rate by 8% ~ 9% in white-box attacks and black-box attacks comparing with universal perturbations generated by UAP.