Abstract:This paper introduces SGCode, a flexible prompt-optimizing system to generate secure code with large language models (LLMs). SGCode integrates recent prompt-optimization approaches with LLMs in a unified system accessible through front-end and back-end APIs, enabling users to 1) generate secure code, which is free of vulnerabilities, 2) review and share security analysis, and 3) easily switch from one prompt optimization approach to another, while providing insights on model and system performance. We populated SGCode on an AWS server with PromSec, an approach that optimizes prompts by combining an LLM and security tools with a lightweight generative adversarial graph neural network to detect and fix security vulnerabilities in the generated code. Extensive experiments show that SGCode is practical as a public tool to gain insights into the trade-offs between model utility, secure code generation, and system cost. SGCode has only a marginal cost compared with prompting LLMs. SGCode is available at: http://3.131.141.63:8501/.
Abstract:Federated learning enables multiple clients to collaboratively contribute to the learning of a global model orchestrated by a central server. This learning scheme promotes clients' data privacy and requires reduced communication overheads. In an application like network traffic classification, this helps hide the network vulnerabilities and weakness points. However, federated learning is susceptible to backdoor attacks, in which adversaries inject manipulated model updates into the global model. These updates inject a salient functionality in the global model that can be launched with specific input patterns. Nonetheless, the vulnerability of network traffic classification models based on federated learning to these attacks remains unexplored. In this paper, we propose GABAttack, a novel genetic algorithm-based backdoor attack against federated learning for network traffic classification. GABAttack utilizes a genetic algorithm to optimize the values and locations of backdoor trigger patterns, ensuring a better fit with the input and the model. This input-tailored dynamic attack is promising for improved attack evasiveness while being effective. Extensive experiments conducted over real-world network datasets validate the success of the proposed GABAttack in various situations while maintaining almost invisible activity. This research serves as an alarming call for network security experts and practitioners to develop robust defense measures against such attacks.
Abstract:Malicious domain detection (MDD) is an open security challenge that aims to detect if an Internet domain is associated with cyber-attacks. Among many approaches to this problem, graph neural networks (GNNs) are deemed highly effective. GNN-based MDD uses DNS logs to represent Internet domains as nodes in a maliciousness graph (DMG) and trains a GNN to infer their maliciousness by leveraging identified malicious domains. Since this method relies on accessible DNS logs to construct DMGs, it exposes a vulnerability for adversaries to manipulate their domain nodes' features and connections within DMGs. Existing research mainly concentrates on threat models that manipulate individual attacker nodes. However, adversaries commonly generate multiple domains to achieve their goals economically and avoid detection. Their objective is to evade discovery across as many domains as feasible. In this work, we call the attack that manipulates several nodes in the DMG concurrently a multi-instance evasion attack. We present theoretical and empirical evidence that the existing single-instance evasion techniques for are inadequate to launch multi-instance evasion attacks against GNN-based MDDs. Therefore, we introduce MintA, an inference-time multi-instance adversarial attack on GNN-based MDDs. MintA enhances node and neighborhood evasiveness through optimized perturbations and operates successfully with only black-box access to the target model, eliminating the need for knowledge about the model's specifics or non-adversary nodes. We formulate an optimization challenge for MintA, achieving an approximate solution. Evaluating MintA on a leading GNN-based MDD technique with real-world data showcases an attack success rate exceeding 80%. These findings act as a warning for security experts, underscoring GNN-based MDDs' susceptibility to practical attacks that can undermine their effectiveness and benefits.
Abstract:Accurate and timely prediction of transportation demand and supply is essential for improving customer experience and raising the provider's profit. Recently, graph neural networks (GNNs) have been shown promising in predicting traffic demand and supply in small city regions. This awes their capability in modeling both a node's historical features and its relational information with other nodes. However, more efficient taxi demand and supply forecasting can still be achieved by following two main routes. First, is extending the scale of the prediction graph to include more regions. Second, is the simultaneous exploitation of multiple node and edge types to better expose and exploit the complex and diverse set of relations in a traffic system. Nevertheless, the applicability of both approaches is challenged by the scalability of system-wide GNN training and inference. An immediate remedy to the scalability challenge is to decentralize the GNN operation. However, decentralizing GNN operation creates excessive node-to-node communication overhead which hinders the potential of this approach. In this paper, we propose a semi-decentralized approach based on the use of multiple, moderately sized, and high-throughout cloudlet communication networks on the edge. This approach combines the best features of the centralized and decentralized settings; it may minimize the inter-cloudlet communication thereby alleviating the communication overhead of the decentralized approach while promoting scalability due to cloudlet-level decentralization. Also, we propose a heterogeneous GNN-LSTM algorithm for improved taxi-level demand and supply forecasting. This approach allows for handling dynamic taxi graphs where nodes are taxis. Through a set of experiments over real data, we show the advantage of the semi-decentralized approach as tested over our GNN-LSTM algorithm for taxi demand and supply prediction.
Abstract:Graph neural networks (GNNs) are susceptible to privacy inference attacks (PIAs), given their ability to learn joint representation from features and edges among nodes in graph data. To prevent privacy leakages in GNNs, we propose a novel heterogeneous randomized response (HeteroRR) mechanism to protect nodes' features and edges against PIAs under differential privacy (DP) guarantees without an undue cost of data and model utility in training GNNs. Our idea is to balance the importance and sensitivity of nodes' features and edges in redistributing the privacy budgets since some features and edges are more sensitive or important to the model utility than others. As a result, we derive significantly better randomization probabilities and tighter error bounds at both levels of nodes' features and edges departing from existing approaches, thus enabling us to maintain high data utility for training GNNs. An extensive theoretical and empirical analysis using benchmark datasets shows that HeteroRR significantly outperforms various baselines in terms of model utility under rigorous privacy protection for both nodes' features and edges. That enables us to defend PIAs in DP-preserving GNNs effectively.
Abstract:Trojan backdoor is a poisoning attack against Neural Network (NN) classifiers in which adversaries try to exploit the (highly desirable) model reuse property to implant Trojans into model parameters for backdoor breaches through a poisoned training process. Most of the proposed defenses against Trojan attacks assume a white-box setup, in which the defender either has access to the inner state of NN or is able to run back-propagation through it. In this work, we propose a more practical black-box defense, dubbed TrojDef, which can only run forward-pass of the NN. TrojDef tries to identify and filter out Trojan inputs (i.e., inputs augmented with the Trojan trigger) by monitoring the changes in the prediction confidence when the input is repeatedly perturbed by random noise. We derive a function based on the prediction outputs which is called the prediction confidence bound to decide whether the input example is Trojan or not. The intuition is that Trojan inputs are more stable as the misclassification only depends on the trigger, while benign inputs will suffer when augmented with noise due to the perturbation of the classification features. Through mathematical analysis, we show that if the attacker is perfect in injecting the backdoor, the Trojan infected model will be trained to learn the appropriate prediction confidence bound, which is used to distinguish Trojan and benign inputs under arbitrary perturbations. However, because the attacker might not be perfect in injecting the backdoor, we introduce a nonlinear transform to the prediction confidence bound to improve the detection accuracy in practical settings. Extensive empirical evaluations show that TrojDef significantly outperforms the-state-of-the-art defenses and is highly stable under different settings, even when the classifier architecture, the training process, or the hyper-parameters change.
Abstract:This paper explores previously unknown backdoor risks in HyperNet-based personalized federated learning (HyperNetFL) through poisoning attacks. Based upon that, we propose a novel model transferring attack (called HNTROJ), i.e., the first of its kind, to transfer a local backdoor infected model to all legitimate and personalized local models, which are generated by the HyperNetFL model, through consistent and effective malicious local gradients computed across all compromised clients in the whole training process. As a result, HNTROJ reduces the number of compromised clients needed to successfully launch the attack without any observable signs of sudden shifts or degradation regarding model utility on legitimate data samples making our attack stealthy. To defend against HNTROJ, we adapted several backdoor-resistant FL training algorithms into HyperNetFL. An extensive experiment that is carried out using several benchmark datasets shows that HNTROJ significantly outperforms data poisoning and model replacement attacks and bypasses robust training algorithms.
Abstract:The growing use of social media has led to the development of several Machine Learning (ML) and Natural Language Processing(NLP) tools to process the unprecedented amount of social media content to make actionable decisions. However, these MLand NLP algorithms have been widely shown to be vulnerable to adversarial attacks. These vulnerabilities allow adversaries to launch a diversified set of adversarial attacks on these algorithms in different applications of social media text processing. In this paper, we provide a comprehensive review of the main approaches for adversarial attacks and defenses in the context of social media applications with a particular focus on key challenges and future research directions. In detail, we cover literature on six key applications, namely (i) rumors detection, (ii) satires detection, (iii) clickbait & spams identification, (iv) hate speech detection, (v)misinformation detection, and (vi) sentiment analysis. We then highlight the concurrent and anticipated future research questions and provide recommendations and directions for future work.
Abstract:Traffic management systems capture tremendous video data and leverage advances in video processing to detect and monitor traffic incidents. The collected data are traditionally forwarded to the traffic management center (TMC) for in-depth analysis and may thus exacerbate the network paths to the TMC. To alleviate such bottlenecks, we propose to utilize edge computing by equipping edge nodes that are close to cameras with computing resources (e.g. cloudlets). A cloudlet, with limited computing resources as compared to TMC, provides limited video processing capabilities. In this paper, we focus on two common traffic monitoring tasks, congestion detection, and speed detection, and propose a two-tier edge computing based model that takes into account of both the limited computing capability in cloudlets and the unstable network condition to the TMC. Our solution utilizes two algorithms for each task, one implemented at the edge and the other one at the TMC, which are designed with the consideration of different computing resources. While the TMC provides strong computation power, the video quality it receives depends on the underlying network conditions. On the other hand, the edge processes very high-quality video but with limited computing resources. Our model captures this trade-off. We evaluate the performance of the proposed two-tier model as well as the traffic monitoring algorithms via test-bed experiments under different weather as well as network conditions and show that our proposed hybrid edge-cloud solution outperforms both the cloud-only and edge-only solutions.
Abstract:In this work, we show how to jointly exploit adversarial perturbation and model poisoning vulnerabilities to practically launch a new stealthy attack, dubbed AdvTrojan. AdvTrojan is stealthy because it can be activated only when: 1) a carefully crafted adversarial perturbation is injected into the input examples during inference, and 2) a Trojan backdoor is implanted during the training process of the model. We leverage adversarial noise in the input space to move Trojan-infected examples across the model decision boundary, making it difficult to detect. The stealthiness behavior of AdvTrojan fools the users into accidentally trust the infected model as a robust classifier against adversarial examples. AdvTrojan can be implemented by only poisoning the training data similar to conventional Trojan backdoor attacks. Our thorough analysis and extensive experiments on several benchmark datasets show that AdvTrojan can bypass existing defenses with a success rate close to 100% in most of our experimental scenarios and can be extended to attack federated learning tasks as well.