Abstract:Continuous-time dynamic graphs (CTDGs) are essential for modeling interconnected, evolving systems. Traditional methods for extracting knowledge from these graphs often depend on feature engineering or deep learning. Feature engineering is limited by the manual and time-intensive nature of crafting features, while deep learning approaches suffer from high inference latency, making them impractical for real-time applications. This paper introduces Deep-Graph-Sprints (DGS), a novel deep learning architecture designed for efficient representation learning on CTDGs with low-latency inference requirements. We benchmark DGS against state-of-the-art feature engineering and graph neural network methods using five diverse datasets. The results indicate that DGS achieves competitive performance while improving inference speed up to 12x compared to other deep learning approaches on our tested benchmarks. Our method effectively bridges the gap between deep representation learning and low-latency application requirements for CTDGs.
Abstract:Unmanned Aerial Vehicles (UAVs) are increasingly used to enable wireless communications. Due to their characteristics, such as the ability to hover and carry cargo, UAVs can serve as communications nodes, including Wi-Fi Access Points and Cellular Base Stations. In previous work, we proposed the Sustainable multi-UAV Performance-aware Placement (SUPPLY) algorithm, which focuses on the energy-efficient placement of multiple UAVs acting as Flying Access Points (FAPs). Additionally, we developed the Multi-UAV Energy Consumption (MUAVE) simulator to evaluate the UAV energy consumption, specifically when using the SUPPLY algorithm. However, MUAVE was initially designed to compute the energy consumption for rotary-wing UAVs only. In this paper, we propose eMUAVE, an enhanced version of the MUAVE simulator that allows the evaluation of the energy consumption for both rotary-wing and fixed-wing UAVs. Our energy consumption evaluation using eMUAVE considers reference and random networking scenarios. The results show that fixed-wing UAVs can be employed in the majority of networking scenarios. However, rotary-wing UAVs are typically more energy-efficient than fixed-wing UAVs when following the trajectories defined by SUPPLY.
Abstract:Unmanned Aerial Vehicles (UAVs) are used for a wide range of applications. Due to characteristics such as the ability to hover and carry cargo on-board, rotary-wing UAVs have been considered suitable platforms for carrying communications nodes, including Wi-Fi Access Points and cellular Base Stations. This gave rise to the concept of Flying Networks (FNs), now making part of the so-called Non-Terrestrial Networks (NTNs) defined in 3GPP. In scenarios where the deployment of terrestrial networks is not feasible, the use of FNs has emerged as a solution to provide wireless connectivity. However, the management of the communications resources in FNs imposes significant challenges, especially regarding the positioning of the UAVs so that the Quality of Service (QoS) offered to the Ground Users (GUs) and devices is maximized. Moreover, unlike terrestrial networks that are directly connected to the power grid, UAVs typically rely on on-board batteries that need to be recharged. In order to maximize the UAVs' flying time, the energy consumed by the UAVs needs to be minimized. When it comes to multi-UAV placement, most state-of-the-art solutions focus on maximizing the coverage area and assume that the UAVs keep hovering in a fixed position while serving GUs. Also, they do not address the energy-aware multi-UAV placement problem in networking scenarios where the GUs may have different QoS requirements and may not be uniformly distributed across the area of interest. In this work, we propose the Sustainable multi-UAV Performance-aware Placement (SUPPLY) algorithm. SUPPLY defines the energy and performance-aware positioning of multiple UAVs in an FN. To accomplish this, SUPPLY defines trajectories that minimize UAVs' energy consumption, while ensuring the targeted QoS levels. The obtained results show up to 25% energy consumption reduction with minimal impact on throughput and delay.
Abstract:Machine learning methods to aid defence systems in detecting malicious activity typically rely on labelled data. In some domains, such labelled data is unavailable or incomplete. In practice this can lead to low detection rates and high false positive rates, which characterise for example anti-money laundering systems. In fact, it is estimated that 1.7--4 trillion euros are laundered annually and go undetected. We propose The GANfather, a method to generate samples with properties of malicious activity, without label requirements. We propose to reward the generation of malicious samples by introducing an extra objective to the typical Generative Adversarial Networks (GANs) loss. Ultimately, our goal is to enhance the detection of illicit activity using the discriminator network as a novel and robust defence system. Optionally, we may encourage the generator to bypass pre-existing detection systems. This setup then reveals defensive weaknesses for the discriminator to correct. We evaluate our method in two real-world use cases, money laundering and recommendation systems. In the former, our method moves cumulative amounts close to 350 thousand dollars through a network of accounts without being detected by an existing system. In the latter, we recommend the target item to a broad user base with as few as 30 synthetic attackers. In both cases, we train a new defence system to capture the synthetic attacks.
Abstract:Many real-world datasets have an underlying dynamic graph structure, where entities and their interactions evolve over time. Machine learning models should consider these dynamics in order to harness their full potential in downstream tasks. Previous approaches for graph representation learning have focused on either sampling k-hop neighborhoods, akin to breadth-first search, or random walks, akin to depth-first search. However, these methods are computationally expensive and unsuitable for real-time, low-latency inference on dynamic graphs. To overcome these limitations, we propose graph-sprints a general purpose feature extraction framework for continuous-time-dynamic-graphs (CTDGs) that has low latency and is competitive with state-of-the-art, higher latency models. To achieve this, a streaming, low latency approximation to the random-walk based features is proposed. In our framework, time-aware node embeddings summarizing multi-hop information are computed using only single-hop operations on the incoming edges. We evaluate our proposed approach on three open-source datasets and two in-house datasets, and compare with three state-of-the-art algorithms (TGN-attn, TGN-ID, Jodie). We demonstrate that our graph-sprints features, combined with a machine learning classifier, achieve competitive performance (outperforming all baselines for the node classification tasks in five datasets). Simultaneously, graph-sprints significantly reduce inference latencies, achieving close to an order of magnitude speed-up in our experimental setting.
Abstract:In many evolutionary computation systems, parent selection methods can affect, among other things, convergence to a solution. In this paper, we present a study comparing the role of two commonly used parent selection methods in evolving machine learning pipelines in an automated machine learning system called Tree-based Pipeline Optimization Tool (TPOT). Specifically, we demonstrate, using experiments on multiple datasets, that lexicase selection leads to significantly faster convergence as compared to NSGA-II in TPOT. We also compare the exploration of parts of the search space by these selection methods using a trie data structure that contains information about the pipelines explored in a particular run.
Abstract:As a general trend in industrial robotics, an increasing number of safety functions are being developed or re-engineered to be handled in software rather than by physical hardware such as safety relays or interlock circuits. This trend reinforces the importance of supplementing traditional, input-based testing and quality procedures which are widely used in industry today, with formal verification and model-checking methods. To this end, this paper focuses on a representative safety-critical system in an ABB industrial paint robot, namely the High-Voltage electrostatic Control system (HVC). The practical convergence of the high-voltage produced by the HVC, essential for safe operation, is formally verified using a novel and general co-verification framework where hardware and software models are related via platform mappings. This approach enables the pragmatic combination of highly diverse and specialised tools. The paper's main contribution includes details on how hardware abstraction and verification results can be transferred between tools in order to verify system-level safety properties. It is noteworthy that the HVC application considered in this paper has a rather generic form of a feedback controller. Hence, the co-verification framework and experiences reported here are also highly relevant for any cyber-physical system tracking a setpoint reference.
Abstract:Money laundering is a global problem that concerns legitimizing proceeds from serious felonies (1.7-4 trillion euros annually) such as drug dealing, human trafficking, or corruption. The anti-money laundering systems deployed by financial institutions typically comprise rules aligned with regulatory frameworks. Human investigators review the alerts and report suspicious cases. Such systems suffer from high false-positive rates, undermining their effectiveness and resulting in high operational costs. We propose a machine learning triage model, which complements the rule-based system and learns to predict the risk of an alert accurately. Our model uses both entity-centric engineered features and attributes characterizing inter-entity relations in the form of graph-based features. We leverage time windows to construct the dynamic graph, optimizing for time and space efficiency. We validate our model on a real-world banking dataset and show how the triage model can reduce the number of false positives by 80% while detecting over 90% of true positives. In this way, our model can significantly improve anti-money laundering operations.
Abstract:Time series data are ubiquitous in several domains as climate, economics and health care. Mining features from these time series is a crucial task with a multidisciplinary impact. Usually, these features are obtained from structural characteristics of time series, such as trend, seasonality and autocorrelation, sometimes requiring data transformations and parametric models. A recent conceptual approach relies on time series mapping to complex networks, where the network science methodologies can help characterize time series. In this paper, we consider two mapping concepts, visibility and transition probability and propose network topological measures as a new set of time series features. To evaluate the usefulness of the proposed features, we address the problem of time series clustering. More specifically, we propose a clustering method that consists in mapping the time series into visibility graphs and quantile graphs, calculating global topological metrics of the resulting networks, and using data mining techniques to form clusters. We apply this method to a data sets of synthetic and empirical time series. The results indicate that network-based features capture the information encoded in each of the time series models, resulting in high accuracy in a clustering task. Our results are promising and show that network analysis can be used to characterize different types of time series and that different mapping methods capture different characteristics of the time series.
Abstract:There is nowadays a constant flux of data being generated and collected in all types of real world systems. These data sets are often indexed by time, space or both requiring appropriate approaches to analyze the data. In univariate settings, time series analysis is a mature and solid field. However, in multivariate contexts, time series analysis still presents many limitations. In order to address these issues, the last decade has brought approaches based on network science. These methods involve transforming an initial time series data set into one or more networks, which can be analyzed in depth to provide insight into the original time series. This review provides a comprehensive overview of existing mapping methods for transforming time series into networks for a wide audience of researchers and practitioners in machine learning, data mining and time series. Our main contribution is a structured review of existing methodologies, identifying their main characteristics and their differences. We describe the main conceptual approaches, provide authoritative references and give insight into their advantages and limitations in a unified notation and language. We first describe the case of univariate time series, which can be mapped to single layer networks, and we divide the current mappings based on the underlying concept: visibility, transition and proximity. We then proceed with multivariate time series discussing both single layer and multiple layer approaches. Although still very recent, this research area has much potential and with this survey we intend to pave the way for future research on the topic.