Abstract:The integration of the Internet of Things (IoT) into Cyber-Physical Systems (CPSs) has expanded their cyber-attack surface, introducing new and sophisticated threats with potential to exploit emerging vulnerabilities. Assessing the risks of CPSs is increasingly difficult due to incomplete and outdated cybersecurity knowledge. This highlights the urgent need for better-informed risk assessments and mitigation strategies. While previous efforts have relied on rule-based natural language processing (NLP) tools to map vulnerabilities, weaknesses, and attack patterns, recent advancements in Large Language Models (LLMs) present a unique opportunity to enhance cyber-attack knowledge completion through improved reasoning, inference, and summarization capabilities. We apply embedding models to encapsulate information on attack patterns and adversarial techniques, generating mappings between them using vector embeddings. Additionally, we propose a Retrieval-Augmented Generation (RAG)-based approach that leverages pre-trained models to create structured mappings between different taxonomies of threat patterns. Further, we use a small hand-labeled dataset to compare the proposed RAG-based approach to a baseline standard binary classification model. Thus, the proposed approach provides a comprehensive framework to address the challenge of cyber-attack knowledge graph completion.
Abstract:The abundance of cyber-physical components in modern day power grid with their diverse hardware and software vulnerabilities has made it difficult to protect them from advanced persistent threats (APTs). An attack graph depicting the propagation of potential cyber-attack sequences from the initial access point to the end objective is vital to identify critical weaknesses of any cyber-physical system. A cyber security personnel can accordingly plan preventive mitigation measures for the identified weaknesses addressing the cyber-attack sequences. However, limitations on available cybersecurity budget restrict the choice of mitigation measures. We address this aspect through our framework, which solves the following problem: given potential cyber-attack sequences for a cyber-physical component in the power grid, find the optimal manner to allocate an available budget to implement necessary preventive mitigation measures. We formulate the problem as a mixed integer linear program (MILP) to identify the optimal budget partition and set of mitigation measures which minimize the vulnerability of cyber-physical components to potential attack sequences. We assume that the allocation of budget affects the efficacy of the mitigation measures. We show how altering the budget allocation for tasks such as asset management, cybersecurity infrastructure improvement, incident response planning and employee training affects the choice of the optimal set of preventive mitigation measures and modifies the associated cybersecurity risk. The proposed framework can be used by cyber policymakers and system owners to allocate optimal budgets for various tasks required to improve the overall security of a cyber-physical system.
Abstract:Networks are a fundamental and flexible way of representing various complex systems. Many domains such as communication, citation, procurement, biology, social media, and transportation can be modeled as a set of entities and their relationships. Temporal networks are a specialization of general networks where the temporal evolution of the system is as important to understand as the structure of the entities and relationships. We present the Independent Temporal Motif (ITeM) to characterize temporal graphs from different domains. The ITeMs are edge-disjoint temporal motifs that can be used to model the structure and the evolution of the graph. For a given temporal graph, we produce a feature vector of ITeM frequencies and apply this distribution to the task of measuring the similarity of temporal graphs. We show that ITeM has higher accuracy than other motif frequency-based approaches. We define various metrics based on ITeM that reveal salient properties of a temporal network. We also present importance sampling as a method for efficiently estimating the ITeM counts. We evaluate our approach on both synthetic and real temporal networks.
Abstract:Link prediction, or predicting the likelihood of a link in a knowledge graph based on its existing state is a key research task. It differs from a traditional link prediction task in that the links in a knowledge graph are categorized into different predicates and the link prediction performance of different predicates in a knowledge graph generally varies widely. In this work, we propose a latent feature embedding based link prediction model which considers the prediction task for each predicate disjointly. To learn the model parameters it utilizes a Bayesian personalized ranking based optimization technique. Experimental results on large-scale knowledge bases such as YAGO2 show that our link prediction approach achieves substantially higher performance than several state-of-art approaches. We also show that for a given predicate the topological properties of the knowledge graph induced by the given predicate edges are key indicators of the link prediction performance of that predicate in the knowledge graph.