Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zeyan Li

Tsinghua University

ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning

Dec 04, 2024

Zhe Xie, Zeyan Li, Xiao He, Longlong Xu, Xidao Wen, Tieying Zhang, Jianjun Chen, Rui Shi, Dan Pei

Figure 1 for ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning

Figure 2 for ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning

Figure 3 for ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning

Figure 4 for ChatTS: Aligning Time Series with LLMs via Synthetic Data for Enhanced Understanding and Reasoning

Abstract:Understanding time series is crucial for its application in real-world scenarios. Recently, large language models (LLMs) have been increasingly applied to time series tasks, leveraging their strong language capabilities to enhance various applications. However, research on multimodal LLMs (MLLMs) for time series understanding and reasoning remains limited, primarily due to the scarcity of high-quality datasets that align time series with textual information. This paper introduces ChatTS, a novel MLLM designed for time series analysis. ChatTS treats time series as a modality, similar to how vision MLLMs process images, enabling it to perform both understanding and reasoning with time series. To address the scarcity of training data, we propose an attribute-based method for generating synthetic time series with detailed attribute descriptions. We further introduce Time Series Evol-Instruct, a novel approach that generates diverse time series Q&As, enhancing the model's reasoning capabilities. To the best of our knowledge, ChatTS is the first MLLM that takes multivariate time series as input, which is fine-tuned exclusively on synthetic datasets. We evaluate its performance using benchmark datasets with real-world data, including six alignment tasks and four reasoning tasks. Our results show that ChatTS significantly outperforms existing vision-based MLLMs (e.g., GPT-4o) and text/agent-based LLMs, achieving a 46.0% improvement in alignment tasks and a 25.8% improvement in reasoning tasks.

* 14 pages, 14 figures

Via

Access Paper or Ask Questions

Generic and Robust Root Cause Localization for Multi-Dimensional Data in Online Service Systems

May 05, 2023

Zeyan Li, Junjie Chen, Yihao Chen, Chengyang Luo, Yiwei Zhao, Yongqian Sun, Kaixin Sui, Xiping Wang, Dapeng Liu, Xing Jin(+2 more)

Figure 1 for Generic and Robust Root Cause Localization for Multi-Dimensional Data in Online Service Systems

Figure 2 for Generic and Robust Root Cause Localization for Multi-Dimensional Data in Online Service Systems

Figure 3 for Generic and Robust Root Cause Localization for Multi-Dimensional Data in Online Service Systems

Figure 4 for Generic and Robust Root Cause Localization for Multi-Dimensional Data in Online Service Systems

Abstract:Localizing root causes for multi-dimensional data is critical to ensure online service systems' reliability. When a fault occurs, only the measure values within specific attribute combinations are abnormal. Such attribute combinations are substantial clues to the underlying root causes and thus are called root causes of multidimensional data. This paper proposes a generic and robust root cause localization approach for multi-dimensional data, PSqueeze. We propose a generic property of root cause for multi-dimensional data, generalized ripple effect (GRE). Based on it, we propose a novel probabilistic cluster method and a robust heuristic search method. Moreover, we identify the importance of determining external root causes and propose an effective method for the first time in literature. Our experiments on two real-world datasets with 5400 faults show that the F1-score of PSqueeze outperforms baselines by 32.89%, while the localization time is around 10 seconds across all cases. The F1-score in determining external root causes of PSqueeze achieves 0.90. Furthermore, case studies in several production systems demonstrate that PSqueeze is helpful to fault diagnosis in the real world.

* Accepted by Journal of Systems and Software at May 4 2023

Via

Access Paper or Ask Questions

Causal Inference-Based Root Cause Analysis for Online Service Systems with Intervention Recognition

Jun 13, 2022

Mingjie Li, Zeyan Li, Kanglin Yin, Xiaohui Nie, Wenchi Zhang, Kaixin Sui, Dan Pei

Figure 1 for Causal Inference-Based Root Cause Analysis for Online Service Systems with Intervention Recognition

Figure 2 for Causal Inference-Based Root Cause Analysis for Online Service Systems with Intervention Recognition

Figure 3 for Causal Inference-Based Root Cause Analysis for Online Service Systems with Intervention Recognition

Figure 4 for Causal Inference-Based Root Cause Analysis for Online Service Systems with Intervention Recognition

Abstract:Fault diagnosis is critical in many domains, as faults may lead to safety threats or economic losses. In the field of online service systems, operators rely on enormous monitoring data to detect and mitigate failures. Quickly recognizing a small set of root cause indicators for the underlying fault can save much time for failure mitigation. In this paper, we formulate the root cause analysis problem as a new causal inference task named intervention recognition. We proposed a novel unsupervised causal inference-based method named Causal Inference-based Root Cause Analysis (CIRCA). The core idea is a sufficient condition for a monitoring variable to be a root cause indicator, i.e., the change of probability distribution conditioned on the parents in the Causal Bayesian Network (CBN). Towards the application in online service systems, CIRCA constructs a graph among monitoring metrics based on the knowledge of system architecture and a set of causal assumptions. The simulation study illustrates the theoretical reliability of CIRCA. The performance on a real-world dataset further shows that CIRCA can improve the recall of the top-1 recommendation by 25% over the best baseline method.

* Accepted at KDD 2022 Applied Data Science Track

Via

Access Paper or Ask Questions

GenAD: General Representations of Multivariate Time Seriesfor Anomaly Detection

Feb 09, 2022

Xiaolei Hua, Lin Zhu, Shenglin Zhang, Zeyan Li, Su Wang, Dong Zhou, Shuo Wang, Chao Deng

Figure 1 for GenAD: General Representations of Multivariate Time Seriesfor Anomaly Detection

Figure 2 for GenAD: General Representations of Multivariate Time Seriesfor Anomaly Detection

Figure 3 for GenAD: General Representations of Multivariate Time Seriesfor Anomaly Detection

Figure 4 for GenAD: General Representations of Multivariate Time Seriesfor Anomaly Detection

Abstract:The reliability of wireless base stations in China Mobile is of vital importance, because the cell phone users are connected to the stations and the behaviors of the stations are directly related to user experience. Although the monitoring of the station behaviors can be realized by anomaly detection on multivariate time series, due to complex correlations and various temporal patterns of multivariate series in large-scale stations, building a general unsupervised anomaly detection model with a higher F1-score remains a challenging task. In this paper, we propose a General representation of multivariate time series for Anomaly Detection(GenAD). First, we pre-train a general model on large-scale wireless base stations with self-supervision, which can be easily transferred to a specific station anomaly detection with a small amount of training data. Second, we employ Multi-Correlation Attention and Time-Series Attention to represent the correlations and temporal patterns of the stations. With the above innovations, GenAD increases F1-score by total 9% on real-world datasets in China Mobile, while the performance does not significantly degrade on public datasets with only 10% of the training data.

Via

Access Paper or Ask Questions

Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications

Feb 12, 2018

Haowen Xu, Wenxiao Chen, Nengwen Zhao, Zeyan Li, Jiahao Bu, Zhihan Li, Ying Liu, Youjian Zhao, Dan Pei, Yang Feng(+3 more)

Figure 1 for Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications

Figure 2 for Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications

Figure 3 for Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications

Figure 4 for Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications

Abstract:To ensure undisrupted business, large Internet companies need to closely monitor various KPIs (e.g., Page Views, number of online users, and number of orders) of its Web applications, to accurately detect anomalies and trigger timely troubleshooting/mitigation. However, anomaly detection for these seasonal KPIs with various patterns and data quality has been a great challenge, especially without labels. In this paper, we proposed Donut, an unsupervised anomaly detection algorithm based on VAE. Thanks to a few of our key techniques, Donut greatly outperforms a state-of-arts supervised ensemble approach and a baseline VAE approach, and its best F-scores range from 0.75 to 0.9 for the studied KPIs from a top global Internet company. We come up with a novel KDE interpretation of reconstruction for Donut, making it the first VAE-based anomaly detection algorithm with solid theoretical explanation.

* 12 pages (including references), 17 figures, submitted to WWW 2018: The 2018 Web Conference, April 23--27, 2018, Lyon, France. The contents discarded from the conference version due to the 9-page limitation are also included in this version

Via

Access Paper or Ask Questions