Abstract:This paper addresses a significant gap in Autonomous Cyber Operations (ACO) literature: the absence of effective edge-blocking ACO strategies in dynamic, real-world networks. It specifically targets the cybersecurity vulnerabilities of organizational Active Directory (AD) systems. Unlike the existing literature on edge-blocking defenses which considers AD systems as static entities, our study counters this by recognizing their dynamic nature and developing advanced edge-blocking defenses through a Stackelberg game model between attacker and defender. We devise a Reinforcement Learning (RL)-based attack strategy and an RL-assisted Evolutionary Diversity Optimization-based defense strategy, where the attacker and defender improve each other strategy via parallel gameplay. To address the computational challenges of training attacker-defender strategies on numerous dynamic AD graphs, we propose an RL Training Facilitator that prunes environments and neural networks to eliminate irrelevant elements, enabling efficient and scalable training for large graphs. We extensively train the attacker strategy, as a sophisticated attacker model is essential for a robust defense. Our empirical results successfully demonstrate that our proposed approach enhances defender's proficiency in hardening dynamic AD graphs while ensuring scalability for large-scale AD.
Abstract:Microsoft Active Directory (AD) is the default security management system for Window domain network. We study the problem of placing decoys in AD network to detect potential attacks. We model the problem as a Stackelberg game between an attacker and a defender on AD attack graphs where the defender employs a set of decoys to detect the attacker on their way to Domain Admin (DA). Contrary to previous works, we consider time-varying (temporal) attack graphs. We proposed a novel metric called response time, to measure the effectiveness of our decoy placement in temporal attack graphs. Response time is defined as the duration from the moment attackers trigger the first decoy to when they compromise the DA. Our goal is to maximize the defender's response time to the worst-case attack paths. We establish the NP-hard nature of the defender's optimization problem, leading us to develop Evolutionary Diversity Optimization (EDO) algorithms. EDO algorithms identify diverse sets of high-quality solutions for the optimization problem. Despite the polynomial nature of the fitness function, it proves experimentally slow for larger graphs. To enhance scalability, we proposed an algorithm that exploits the static nature of AD infrastructure in the temporal setting. Then, we introduce tailored repair operations, ensuring the convergence to better results while maintaining scalability for larger graphs.
Abstract:In many collaborative settings, artificial intelligence (AI) agents must be able to adapt to new teammates that use unknown or previously unobserved strategies. While often simple for humans, this can be challenging for AI agents. For example, if an AI agent learns to drive alongside others (a training set) that only drive on one side of the road, it may struggle to adapt this experience to coordinate with drivers on the opposite side, even if their behaviours are simply flipped along the left-right symmetry. To address this we introduce symmetry-breaking augmentations (SBA), which increases diversity in the behaviour of training teammates by applying a symmetry-flipping operation. By learning a best-response to the augmented set of teammates, our agent is exposed to a wider range of behavioural conventions, improving performance when deployed with novel teammates. We demonstrate this experimentally in two settings, and show that our approach improves upon previous ad hoc teamwork results in the challenging card game Hanabi. We also propose a general metric for estimating symmetry-dependency amongst a given set of policies.
Abstract:Creating diverse sets of high quality solutions has become an important problem in recent years. Previous works on diverse solutions problems consider solutions' objective quality and diversity where one is regarded as the optimization goal and the other as the constraint. In this paper, we treat this problem as a bi-objective optimization problem, which is to obtain a range of quality-diversity trade-offs. To address this problem, we frame the evolutionary process as evolving a population of populations, and present a suitable general implementation scheme that is compatible with existing evolutionary multi-objective search methods. We realize the scheme in NSGA-II and SPEA2, and test the methods on various instances of maximum coverage, maximum cut and minimum vertex cover problems. The resulting non-dominated populations exhibit rich qualitative features, giving insights into the optimization instances and the quality-diversity trade-offs they induce.
Abstract:We study a Stackelberg game between an attacker and a defender on large Active Directory (AD) attack graphs where the defender employs a set of honeypots to stop the attacker from reaching high-value targets. Contrary to existing works that focus on small and static attack graphs, AD graphs typically contain hundreds of thousands of nodes and edges and constantly change over time. We consider two types of attackers: a simple attacker who cannot observe honeypots and a competent attacker who can. To jointly solve the game, we propose a mixed-integer programming (MIP) formulation. We observed that the optimal blocking plan for static graphs performs poorly in dynamic graphs. To solve the dynamic graph problem, we re-design the mixed-integer programming formulation by combining m MIP (dyMIP(m)) instances to produce a near-optimal blocking plan. Furthermore, to handle a large number of dynamic graph instances, we use a clustering algorithm to efficiently find the m-most representative graph instances for a constant m (dyMIP(m)). We prove a lower bound on the optimal blocking strategy for dynamic graphs and show that our dyMIP(m) algorithms produce close to optimal results for a range of AD graphs under realistic conditions.
Abstract:Large-Language Models (LLMs) have shifted the paradigm of natural language data processing. However, their black-boxed and probabilistic characteristics can lead to potential risks in the quality of outputs in diverse LLM applications. Recent studies have tested Quality Attributes (QAs), such as robustness or fairness, of LLMs by generating adversarial input texts. However, existing studies have limited their coverage of QAs and tasks in LLMs and are difficult to extend. Additionally, these studies have only used one evaluation metric, Attack Success Rate (ASR), to assess the effectiveness of their approaches. We propose a MEtamorphic Testing for Analyzing LLMs (METAL) framework to address these issues by applying Metamorphic Testing (MT) techniques. This approach facilitates the systematic testing of LLM qualities by defining Metamorphic Relations (MRs), which serve as modularized evaluation metrics. The METAL framework can automatically generate hundreds of MRs from templates that cover various QAs and tasks. In addition, we introduced novel metrics that integrate the ASR method into the semantic qualities of text to assess the effectiveness of MRs accurately. Through the experiments conducted with three prominent LLMs, we have confirmed that the METAL framework effectively evaluates essential QAs on primary LLM tasks and reveals the quality risks in LLMs. Moreover, the newly proposed metrics can guide the optimal MRs for testing each task and suggest the most effective method for generating MRs.
Abstract:We study a Stackelberg game between one attacker and one defender in a configurable environment. The defender picks a specific environment configuration. The attacker observes the configuration and attacks via Reinforcement Learning (RL trained against the observed environment). The defender's goal is to find the environment with minimum achievable reward for the attacker. We apply Evolutionary Diversity Optimization (EDO) to generate diverse population of environments for training. Environments with clearly high rewards are killed off and replaced by new offsprings to avoid wasting training time. Diversity not only improves training quality but also fits well with our RL scenario: RL agents tend to improve gradually, so a slightly worse environment earlier on may become better later. We demonstrate the effectiveness of our approach by focusing on a specific application, Active Directory (AD). AD is the default security management system for Windows domain networks. AD environment describes an attack graph, where nodes represent computers/accounts/etc., and edges represent accesses. The attacker aims to find the best attack path to reach the highest-privilege node. The defender can change the graph by removing a limited number of edges (revoke accesses). Our approach generates better defensive plans than the existing approach and scales better.
Abstract:We propose a combinatorial optimisation model called Limited Query Graph Connectivity Test. We consider a graph whose edges have two possible states (on/off). The edges' states are hidden initially. We could query an edge to reveal its state. Given a source s and a destination t, we aim to test s-t connectivity by identifying either a path (consisting of only on edges) or a cut (consisting of only off edges). We are limited to B queries, after which we stop regardless of whether graph connectivity is established. We aim to design a query policy that minimizes the expected number of queries. If we remove the query limit B (i.e., by setting B to the total number of edges), then our problem becomes a special case of (monotone) Stochastic Boolean Function Evaluation (SBFE). There are two existing exact algorithms that are prohibitively expensive. They have best known upper bounds of O(3^m) and O(2^{2^k}) respectively, where m is the number of edges and k is the number of paths/cuts. These algorithms do not scale well in practice. We propose a significantly more scalable exact algorithm. Our exact algorithm works by iteratively improving the performance lower bound until the lower bound becomes achievable. Even when our exact algorithm does not scale, it can be used as an anytime algorithm for calculating lower bound. We experiment on a wide range of practical graphs. We observe that even for large graphs (i.e., tens of thousands of edges), it mostly takes only a few queries to reach conclusion, which is the practical motivation behind the query limit B. B is also an algorithm parameter that controls scalability. For small B, our exact algorithm scales well. For large B, our exact algorithm can be converted to a heuristic (i.e., always pretend that there are only 5 queries left). Our heuristic outperforms all existing heuristics ported from SBFE and related literature.
Abstract:In this paper, we present the Multi-Forgery Detection Challenge held concurrently with the IEEE Computer Society Workshop on Biometrics at CVPR 2022. Our Multi-Forgery Detection Challenge aims to detect automatic image manipulations including but not limited to image editing, image synthesis, image generation, image photoshop, etc. Our challenge has attracted 674 teams from all over the world, with about 2000 valid result submission counts. We invited the Top 10 teams to present their solutions to the challenge, from which three teams are awarded prizes in the grand finale. In this paper, we present the solutions from the Top 3 teams, in order to boost the research work in the field of image forgery detection.
Abstract:Active Directory (AD) is the default security management system for Windows domain networks. We study a Stackelberg game model between one attacker and one defender on an AD attack graph. The attacker initially has access to a set of entry nodes. The attacker can expand this set by strategically exploring edges. Every edge has a detection rate and a failure rate. The attacker aims to maximize their chance of successfully reaching the destination before getting detected. The defender's task is to block a constant number of edges to decrease the attacker's chance of success. We show that the problem is #P-hard and, therefore, intractable to solve exactly. We convert the attacker's problem to an exponential sized Dynamic Program that is approximated by a Neural Network (NN). Once trained, the NN provides an efficient fitness function for the defender's Evolutionary Diversity Optimisation (EDO). The diversity emphasis on the defender's solution provides a diverse set of training samples, which improves the training accuracy of our NN for modelling the attacker. We go back and forth between NN training and EDO. Experimental results show that for R500 graph, our proposed EDO based defense is less than 1% away from the optimal defense.