Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yanlei Diao

Unsupervised Anomaly Detection in Multivariate Time Series across Heterogeneous Domains

Mar 29, 2025

Vincent Jacob, Yanlei Diao

Abstract:The widespread adoption of digital services, along with the scale and complexity at which they operate, has made incidents in IT operations increasingly more likely, diverse, and impactful. This has led to the rapid development of a central aspect of "Artificial Intelligence for IT Operations" (AIOps), focusing on detecting anomalies in vast amounts of multivariate time series data generated by service entities. In this paper, we begin by introducing a unifying framework for benchmarking unsupervised anomaly detection (AD) methods, and highlight the problem of shifts in normal behaviors that can occur in practical AIOps scenarios. To tackle anomaly detection under domain shift, we then cast the problem in the framework of domain generalization and propose a novel approach, Domain-Invariant VAE for Anomaly Detection (DIVAD), to learn domain-invariant representations for unsupervised anomaly detection. Our evaluation results using the Exathlon benchmark show that the two main DIVAD variants significantly outperform the best unsupervised AD method in maximum performance, with 20% and 15% improvements in maximum peak F1-scores, respectively. Evaluation using the Application Server Dataset further demonstrates the broader applicability of our domain generalization methods.

Via

Access Paper or Ask Questions

Neural-based Modeling for Performance Tuning of Spark Data Analytics

Jan 20, 2021

Khaled Zaouk, Fei Song, Chenghao Lyu, Yanlei Diao

Figure 1 for Neural-based Modeling for Performance Tuning of Spark Data Analytics

Figure 2 for Neural-based Modeling for Performance Tuning of Spark Data Analytics

Figure 3 for Neural-based Modeling for Performance Tuning of Spark Data Analytics

Figure 4 for Neural-based Modeling for Performance Tuning of Spark Data Analytics

Abstract:Cloud data analytics has become an integral part of enterprise business operations for data-driven insight discovery. Performance modeling of cloud data analytics is crucial for performance tuning and other critical operations in the cloud. Traditional modeling techniques fail to adapt to the high degree of diversity in workloads and system behaviors in this domain. In this paper, we bring recent Deep Learning techniques to bear on the process of automated performance modeling of cloud data analytics, with a focus on Spark data analytics as representative workloads. At the core of our work is the notion of learning workload embeddings (with a set of desired properties) to represent fundamental computational characteristics of different jobs, which enable performance prediction when used together with job configurations that control resource allocation and other system knobs. Our work provides an in-depth study of different modeling choices that suit our requirements. Results of extensive experiments reveal the strengths and limitations of different modeling methods, as well as superior performance of our best performing method over a state-of-the-art modeling tool for cloud analytics.

Via

Access Paper or Ask Questions

AnomalyBench: An Open Benchmark for Explainable Anomaly Detection

Oct 10, 2020

Vincent Jacob, Fei Song, Arnaud Stiegler, Yanlei Diao, Nesime Tatbul

Figure 1 for AnomalyBench: An Open Benchmark for Explainable Anomaly Detection

Figure 2 for AnomalyBench: An Open Benchmark for Explainable Anomaly Detection

Figure 3 for AnomalyBench: An Open Benchmark for Explainable Anomaly Detection

Figure 4 for AnomalyBench: An Open Benchmark for Explainable Anomaly Detection

Abstract:Access to high-quality data repositories and benchmarks have been instrumental in advancing the state of the art in many domains, as they provide the research community a common ground for training, testing, evaluating, comparing, and experimenting with novel machine learning models. Lack of such community resources for anomaly detection (AD) severely limits progress. In this report, we present AnomalyBench, the first comprehensive benchmark for explainable AD over high-dimensional (2000+) time series data. AnomalyBench has been systematically constructed based on real data traces from ~100 repeated executions of 10 large-scale stream processing jobs on a Spark cluster. 30+ of these executions were disturbed by introducing ~100 instances of different types of anomalous events (e.g., misbehaving inputs, resource contention, process failures). For each of these anomaly instances, ground truth labels for the root-cause interval as well as those for the effect interval are available, providing a means for supporting both AD tasks and explanation discovery (ED) tasks via root-cause analysis. We demonstrate the key design features and practical utility of AnomalyBench through an experimental study with three state-of-the-art semi-supervised AD techniques.

Via

Access Paper or Ask Questions