Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pablo Moriano

Oak Ridge National Laboratory

Temporal cross-validation impacts multivariate time series subsequence anomaly detection evaluation

Jun 13, 2025

Steven C. Hespeler, Pablo Moriano, Mingyan Li, Samuel C. Hollifield

Abstract:Evaluating anomaly detection in multivariate time series (MTS) requires careful consideration of temporal dependencies, particularly when detecting subsequence anomalies common in fault detection scenarios. While time series cross-validation (TSCV) techniques aim to preserve temporal ordering during model evaluation, their impact on classifier performance remains underexplored. This study systematically investigates the effect of TSCV strategy on the precision-recall characteristics of classifiers trained to detect fault-like anomalies in MTS datasets. We compare walk-forward (WF) and sliding window (SW) methods across a range of validation partition configurations and classifier types, including shallow learners and deep learning (DL) classifiers. Results show that SW consistently yields higher median AUC-PR scores and reduced fold-to-fold performance variance, particularly for deep architectures sensitive to localized temporal continuity. Furthermore, we find that classifier generalization is sensitive to the number and structure of temporal partitions, with overlapping windows preserving fault signatures more effectively at lower fold counts. A classifier-level stratified analysis reveals that certain algorithms, such as random forests (RF), maintain stable performance across validation schemes, whereas others exhibit marked sensitivity. This study demonstrates that TSCV design in benchmarking anomaly detection models on streaming time series and provide guidance for selecting evaluation strategies in temporally structured learning environments.

* 22 pages, 6 figures, 5 tables

Via

Access Paper or Ask Questions

Electrical Load Forecasting over Multihop Smart Metering Networks with Federated Learning

Feb 24, 2025

Ratun Rahman, Pablo Moriano, Samee U. Khan, Dinh C. Nguyen

Abstract:Electric load forecasting is essential for power management and stability in smart grids. This is mainly achieved via advanced metering infrastructure, where smart meters (SMs) record household energy data. Traditional machine learning (ML) methods are often employed for load forecasting but require data sharing which raises data privacy concerns. Federated learning (FL) can address this issue by running distributed ML models at local SMs without data exchange. However, current FL-based approaches struggle to achieve efficient load forecasting due to imbalanced data distribution across heterogeneous SMs. This paper presents a novel personalized federated learning (PFL) method for high-quality load forecasting in metering networks. A meta-learning-based strategy is developed to address data heterogeneity at local SMs in the collaborative training of local load forecasting models. Moreover, to minimize the load forecasting delays in our PFL model, we study a new latency optimization problem based on optimal resource allocation at SMs. A theoretical convergence analysis is also conducted to provide insights into FL design for federated load forecasting. Extensive simulations from real-world datasets show that our method outperforms existing approaches in terms of better load forecasting and reduced operational latency costs.

* arXiv admin note: text overlap with arXiv:2411.10619

Via

Access Paper or Ask Questions

Detecting Masquerade Attacks in Controller Area Networks Using Graph Machine Learning

Aug 10, 2024

William Marfo, Pablo Moriano, Deepak K. Tosh, Shirley V. Moore

Figure 1 for Detecting Masquerade Attacks in Controller Area Networks Using Graph Machine Learning

Figure 2 for Detecting Masquerade Attacks in Controller Area Networks Using Graph Machine Learning

Figure 3 for Detecting Masquerade Attacks in Controller Area Networks Using Graph Machine Learning

Figure 4 for Detecting Masquerade Attacks in Controller Area Networks Using Graph Machine Learning

Abstract:Modern vehicles rely on a myriad of electronic control units (ECUs) interconnected via controller area networks (CANs) for critical operations. Despite their ubiquitous use and reliability, CANs are susceptible to sophisticated cyberattacks, particularly masquerade attacks, which inject false data that mimic legitimate messages at the expected frequency. These attacks pose severe risks such as unintended acceleration, brake deactivation, and rogue steering. Traditional intrusion detection systems (IDS) often struggle to detect these subtle intrusions due to their seamless integration into normal traffic. This paper introduces a novel framework for detecting masquerade attacks in the CAN bus using graph machine learning (ML). We hypothesize that the integration of shallow graph embeddings with time series features derived from CAN frames enhances the detection of masquerade attacks. We show that by representing CAN bus frames as message sequence graphs (MSGs) and enriching each node with contextual statistical attributes from time series, we can enhance detection capabilities across various attack patterns compared to using only graph-based features. Our method ensures a comprehensive and dynamic analysis of CAN frame interactions, improving robustness and efficiency. Extensive experiments on the ROAD dataset validate the effectiveness of our approach, demonstrating statistically significant improvements in the detection rates of masquerade attacks compared to a baseline that uses only graph-based features, as confirmed by Mann-Whitney U and Kolmogorov-Smirnov tests (p < 0.05).

Via

Access Paper or Ask Questions

Benchmarking Unsupervised Online IDS for Masquerade Attacks in CAN

Jun 19, 2024

Pablo Moriano, Steven C. Hespeler, Mingyan Li, Robert A. Bridges

Abstract:Vehicular controller area networks (CANs) are susceptible to masquerade attacks by malicious adversaries. In masquerade attacks, adversaries silence a targeted ID and then send malicious frames with forged content at the expected timing of benign frames. As masquerade attacks could seriously harm vehicle functionality and are the stealthiest attacks to detect in CAN, recent work has devoted attention to compare frameworks for detecting masquerade attacks in CAN. However, most existing works report offline evaluations using CAN logs already collected using simulations that do not comply with domain's real-time constraints. Here we contribute to advance the state of the art by introducing a benchmark study of four different non-deep learning (DL)-based unsupervised online intrusion detection systems (IDS) for masquerade attacks in CAN. Our approach differs from existing benchmarks in that we analyze the effect of controlling streaming data conditions in a sliding window setting. In doing so, we use realistic masquerade attacks being replayed from the ROAD dataset. We show that although benchmarked IDS are not effective at detecting every attack type, the method that relies on detecting changes at the hierarchical structure of clusters of time series produces the best results at the expense of higher computational overhead. We discuss limitations, open challenges, and how the benchmarked methods can be used for practical unsupervised online CAN IDS for masquerade attacks.

* 15 pages, 9 figures, 3 tables

Via

Access Paper or Ask Questions

Robustness of graph embedding methods for community detection

May 01, 2024

Zhi-Feng Wei, Pablo Moriano, Ramakrishnan Kannan

Abstract:This study investigates the robustness of graph embedding methods for community detection in the face of network perturbations, specifically edge deletions. Graph embedding techniques, which represent nodes as low-dimensional vectors, are widely used for various graph machine learning tasks due to their ability to capture structural properties of networks effectively. However, the impact of perturbations on the performance of these methods remains relatively understudied. The research considers state-of-the-art graph embedding methods from two families: matrix factorization (e.g., LE, LLE, HOPE, M-NMF) and random walk-based (e.g., DeepWalk, LINE, node2vec). Through experiments conducted on both synthetic and real-world networks, the study reveals varying degrees of robustness within each family of graph embedding methods. The robustness is found to be influenced by factors such as network size, initial community partition strength, and the type of perturbation. Notably, node2vec and LLE consistently demonstrate higher robustness for community detection across different scenarios, including networks with degree and community size heterogeneity. These findings highlight the importance of selecting an appropriate graph embedding method based on the specific characteristics of the network and the task at hand, particularly in scenarios where robustness to perturbations is crucial.

* 17 pages, 26 figures, 3 tables. Comments are welcome

Via

Access Paper or Ask Questions

CANShield: Signal-based Intrusion Detection for Controller Area Networks

May 03, 2022

Md Hasan Shahriar, Yang Xiao, Pablo Moriano, Wenjing Lou, Y. Thomas Hou

Figure 1 for CANShield: Signal-based Intrusion Detection for Controller Area Networks

Figure 2 for CANShield: Signal-based Intrusion Detection for Controller Area Networks

Figure 3 for CANShield: Signal-based Intrusion Detection for Controller Area Networks

Figure 4 for CANShield: Signal-based Intrusion Detection for Controller Area Networks

Abstract:Modern vehicles rely on a fleet of electronic control units (ECUs) connected through controller area network (CAN) buses for critical vehicular control. However, with the expansion of advanced connectivity features in automobiles and the elevated risks of internal system exposure, the CAN bus is increasingly prone to intrusions and injection attacks. The ordinary injection attacks disrupt the typical timing properties of the CAN data stream, and the rule-based intrusion detection systems (IDS) can easily detect them. However, advanced attackers can inject false data to the time series sensory data (signal), while looking innocuous by the pattern/frequency of the CAN messages. Such attacks can bypass the rule-based IDS or any anomaly-based IDS built on binary payload data. To make the vehicles robust against such intelligent attacks, we propose CANShield, a signal-based intrusion detection framework for the CAN bus. CANShield consists of three modules: a data preprocessing module that handles the high-dimensional CAN data stream at the signal level and makes them suitable for a deep learning model; a data analyzer module consisting of multiple deep autoencoder (AE) networks, each analyzing the time-series data from a different temporal perspective; and finally an attack detection module that uses an ensemble method to make the final decision. Evaluation results on two high-fidelity signal-based CAN attack datasets show the high accuracy and responsiveness of CANShield in detecting wide-range of advanced intrusion attacks.

* 15 pages, 6 figures, A version of this paper is accepted by escar USA 2022

Via

Access Paper or Ask Questions

Detecting CAN Masquerade Attacks with Signal Clustering Similarity

Jan 07, 2022

Pablo Moriano, Robert A. Bridges, Michael D. Iannacone

Figure 1 for Detecting CAN Masquerade Attacks with Signal Clustering Similarity

Figure 2 for Detecting CAN Masquerade Attacks with Signal Clustering Similarity

Figure 3 for Detecting CAN Masquerade Attacks with Signal Clustering Similarity

Figure 4 for Detecting CAN Masquerade Attacks with Signal Clustering Similarity

Abstract:Vehicular Controller Area Networks (CANs) are susceptible to cyber attacks of different levels of sophistication. Fabrication attacks are the easiest to administer -- an adversary simply sends (extra) frames on a CAN -- but also the easiest to detect because they disrupt frame frequency. To overcome time-based detection methods, adversaries must administer masquerade attacks by sending frames in lieu of (and therefore at the expected time of) benign frames but with malicious payloads. Research efforts have proven that CAN attacks, and masquerade attacks in particular, can affect vehicle functionality. Examples include causing unintended acceleration, deactivation of vehicle's brakes, as well as steering the vehicle. We hypothesize that masquerade attacks modify the nuanced correlations of CAN signal time series and how they cluster together. Therefore, changes in cluster assignments should indicate anomalous behavior. We confirm this hypothesis by leveraging our previously developed capability for reverse engineering CAN signals (i.e., CAN-D [Controller Area Network Decoder]) and focus on advancing the state of the art for detecting masquerade attacks by analyzing time series extracted from raw CAN frames. Specifically, we demonstrate that masquerade attacks can be detected by computing time series clustering similarity using hierarchical clustering on the vehicle's CAN signals (time series) and comparing the clustering similarity across CAN captures with and without attacks. We test our approach in a previously collected CAN dataset with masquerade attacks (i.e., the ROAD dataset) and develop a forensic tool as a proof of concept to demonstrate the potential of the proposed approach for detecting CAN masquerade attacks.

* 7 pages, 7 figures, 1 table

Via

Access Paper or Ask Questions

Graph-Based Machine Learning Improves Just-in-Time Defect Prediction

Oct 11, 2021

Jonathan Bryan, Pablo Moriano

Figure 1 for Graph-Based Machine Learning Improves Just-in-Time Defect Prediction

Figure 2 for Graph-Based Machine Learning Improves Just-in-Time Defect Prediction

Figure 3 for Graph-Based Machine Learning Improves Just-in-Time Defect Prediction

Figure 4 for Graph-Based Machine Learning Improves Just-in-Time Defect Prediction

Abstract:The increasing complexity of today's software requires the contribution of thousands of developers. This complex collaboration structure makes developers more likely to introduce defect-prone changes that lead to software faults. Determining when these defect-prone changes are introduced has proven challenging, and using traditional machine learning (ML) methods to make these determinations seems to have reached a plateau. In this work, we build contribution graphs consisting of developers and source files to capture the nuanced complexity of changes required to build software. By leveraging these contribution graphs, our research shows the potential of using graph-based ML to improve Just-In-Time (JIT) defect prediction. We hypothesize that features extracted from the contribution graphs may be better predictors of defect-prone changes than intrinsic features derived from software characteristics. We corroborate our hypothesis using graph-based ML for classifying edges that represent defect-prone changes. This new framing of the JIT defect prediction problem leads to remarkably better results. We test our approach on 14 open-source projects and show that our best model can predict whether or not a code change will lead to a defect with an F1 score as high as 86.25$\%$. This represents an increase of as much as 55.4$\%$ over the state-of-the-art in JIT defect prediction. We describe limitations, open challenges, and how this method can be used for operational JIT defect prediction.

* 9 pages, 2 figures, 2 tables

Via

Access Paper or Ask Questions

Time-Based CAN Intrusion Detection Benchmark

Jan 14, 2021

Deborah H. Blevins, Pablo Moriano, Robert A. Bridges, Miki E. Verma, Michael D. Iannacone, Samuel C Hollifield

Figure 1 for Time-Based CAN Intrusion Detection Benchmark

Figure 2 for Time-Based CAN Intrusion Detection Benchmark

Figure 3 for Time-Based CAN Intrusion Detection Benchmark

Abstract:Modern vehicles are complex cyber-physical systems made of hundreds of electronic control units (ECUs) that communicate over controller area networks (CANs). This inherited complexity has expanded the CAN attack surface which is vulnerable to message injection attacks. These injections change the overall timing characteristics of messages on the bus, and thus, to detect these malicious messages, time-based intrusion detection systems (IDSs) have been proposed. However, time-based IDSs are usually trained and tested on low-fidelity datasets with unrealistic, labeled attacks. This makes difficult the task of evaluating, comparing, and validating IDSs. Here we detail and benchmark four time-based IDSs against the newly published ROAD dataset, the first open CAN IDS dataset with real (non-simulated) stealthy attacks with physically verified effects. We found that methods that perform hypothesis testing by explicitly estimating message timing distributions have lower performance than methods that seek anomalies in a distribution-related statistic. In particular, these "distribution-agnostic" based methods outperform "distribution-based" methods by at least 55% in area under the precision-recall curve (AUC-PR). Our results expand the body of knowledge of CAN time-based IDSs by providing details of these methods and reporting their results when tested on datasets with real advanced attacks. Finally, we develop an after-market plug-in detector using lightweight hardware, which can be used to deploy the best performing IDS method on nearly any vehicle.

* 7 pages, 2 figures

Via

Access Paper or Ask Questions

Using Bursty Announcements for Early Detection of BGP Routing Anomalies

May 14, 2019

Pablo Moriano, Raquel Hill, L. Jean Camp

Figure 1 for Using Bursty Announcements for Early Detection of BGP Routing Anomalies

Figure 2 for Using Bursty Announcements for Early Detection of BGP Routing Anomalies

Figure 3 for Using Bursty Announcements for Early Detection of BGP Routing Anomalies

Figure 4 for Using Bursty Announcements for Early Detection of BGP Routing Anomalies

Abstract:Despite the robust structure of the Internet, it is still susceptible to disruptive routing updates that prevent network traffic from reaching its destination. In this work, we propose a method for early detection of large-scale disruptions based on the analysis of bursty BGP announcements. We hypothesize that the occurrence of large-scale disruptions is preceded by bursty announcements. Our method is grounded in analysis of changes in the inter-arrival times of announcements. BGP announcements that are associated with disruptive updates tend to occur in groups of relatively high frequency, followed by periods of infrequent activity. To test our hypothesis, we quantify the burstiness of inter-arrival times around the date and times of three large-scale incidents: the Indosat hijacking event in April 2014, the Telecom Malaysia leak in June 2015, and the Bharti Airtel Ltd. hijack in November 2015. We show that we can detect these events several hours prior to when they were originally detected. We propose an algorithm that leverages the burstiness of disruptive updates to provide early detection of large-scale malicious incidents using local collector data. We describe limitations, open challenges, and how this method can be used for large-scale routing anomaly detection.

* 15 pages, 13 figures, 1 table

Via

Access Paper or Ask Questions