Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yasuko Matsubara

Robust and Explainable Detector of Time Series Anomaly via Augmenting Multiclass Pseudo-Anomalies

May 27, 2025

Kohei Obata, Yasuko Matsubara, Yasushi Sakurai

Abstract:Unsupervised anomaly detection in time series has been a pivotal research area for decades. Current mainstream approaches focus on learning normality, on the assumption that all or most of the samples in the training set are normal. However, anomalies in the training set (i.e., anomaly contamination) can be misleading. Recent studies employ data augmentation to generate pseudo-anomalies and learn the boundary separating the training samples from the augmented samples. Although this approach mitigates anomaly contamination if augmented samples mimic unseen real anomalies, it suffers from several limitations. (1) Covering a wide range of time series anomalies is challenging. (2) It disregards augmented samples that resemble normal samples (i.e., false anomalies). (3) It places too much trust in the labels of training and augmented samples. In response, we propose RedLamp, which employs diverse data augmentations to generate multiclass pseudo-anomalies and learns the multiclass boundary. Such multiclass pseudo-anomalies cover a wide variety of time series anomalies. We conduct multiclass classification using soft labels, which prevents the model from being overconfident and ensures its robustness against contaminated/false anomalies. The learned latent space is inherently explainable as it is trained to separate pseudo-anomalies into multiclasses. Extensive experiments demonstrate the effectiveness of RedLamp in anomaly detection and its robustness against anomaly contamination.

* Accepted by KDD 2025

Via

Access Paper or Ask Questions

D-Tracker: Modeling Interest Diffusion in Social Activity Tensor Data Streams

May 01, 2025

Shingo Higashiguchi, Yasuko Matsubara, Koki Kawabata, Taichi Murayama, Yasushi Sakurai

Abstract:Large quantities of social activity data, such as weekly web search volumes and the number of new infections with infectious diseases, reflect peoples' interests and activities. It is important to discover temporal patterns from such data and to forecast future activities accurately. However, modeling and forecasting social activity data streams is difficult because they are high-dimensional and composed of multiple time-varying dynamics such as trends, seasonality, and interest diffusion. In this paper, we propose D-Tracker, a method for continuously capturing time-varying temporal patterns within social activity tensor data streams and forecasting future activities. Our proposed method has the following properties: (a) Interpretable: it incorporates the partial differential equation into a tensor decomposition framework and captures time-varying temporal patterns such as trends, seasonality, and interest diffusion between locations in an interpretable manner; (b) Automatic: it has no hyperparameters and continuously models tensor data streams fully automatically; (c) Scalable: the computation time of D-Tracker is independent of the time series length. Experiments using web search volume data obtained from GoogleTrends, and COVID-19 infection data obtained from COVID-19 Open Data Repository show that our method can achieve higher forecasting accuracy in less computation time than existing methods while extracting the interest diffusion between locations. Our source code and datasets are available at {https://github.com/Higashiguchi-Shingo/D-Tracker.

* ACM SIGKDD 2025 (KDD2025)

Via

Access Paper or Ask Questions

ExPath: Towards Explaining Targeted Pathways for Biological Knowledge Bases

Feb 25, 2025

Rikuto Kotoge, Ziwei Yang, Zheng Chen, Yushun Dong, Yasuko Matsubara, Jimeng Sun, Yasushi Sakurai

Abstract:Biological knowledge bases provide systemically functional pathways of cells or organisms in terms of molecular interaction. However, recognizing more targeted pathways, particularly when incorporating wet-lab experimental data, remains challenging and typically requires downstream biological analyses and expertise. In this paper, we frame this challenge as a solvable graph learning and explaining task and propose a novel pathway inference framework, ExPath, that explicitly integrates experimental data, specifically amino acid sequences (AA-seqs), to classify various graphs (bio-networks) in biological databases. The links (representing pathways) that contribute more to classification can be considered as targeted pathways. Technically, ExPath comprises three components: (1) a large protein language model (pLM) that encodes and embeds AA-seqs into graph, overcoming traditional obstacles in processing AA-seq data, such as BLAST; (2) PathMamba, a hybrid architecture combining graph neural networks (GNNs) with state-space sequence modeling (Mamba) to capture both local interactions and global pathway-level dependencies; and (3) PathExplainer, a subgraph learning module that identifies functionally critical nodes and edges through trainable pathway masks. We also propose ML-oriented biological evaluations and a new metric. The experiments involving 301 bio-networks evaluations demonstrate that pathways inferred by ExPath maintain biological meaningfulness. We will publicly release curated 301 bio-network data soon.

* Under review

Via

Access Paper or Ask Questions

Modeling Time-evolving Causality over Data Streams

Feb 13, 2025

Naoki Chihara, Yasuko Matsubara, Ren Fujiwara, Yasushi Sakurai

Abstract:Given an extensive, semi-infinite collection of multivariate coevolving data sequences (e.g., sensor/web activity streams) whose observations influence each other, how can we discover the time-changing cause-and-effect relationships in co-evolving data streams? How efficiently can we reveal dynamical patterns that allow us to forecast future values? In this paper, we present a novel streaming method, ModePlait, which is designed for modeling such causal relationships (i.e., time-evolving causality) in multivariate co-evolving data streams and forecasting their future values. The solution relies on characteristics of the causal relationships that evolve over time in accordance with the dynamic changes of exogenous variables. ModePlait has the following properties: (a) Effective: it discovers the time-evolving causality in multivariate co-evolving data streams by detecting the transitions of distinct dynamical patterns adaptively. (b) Accurate: it enables both the discovery of time-evolving causality and the forecasting of future values in a streaming fashion. (c) Scalable: our algorithm does not depend on data stream length and thus is applicable to very large sequences. Extensive experiments on both synthetic and real-world datasets demonstrate that our proposed model outperforms state-of-the-art methods in terms of discovering the time-evolving causality as well as forecasting.

* Accepted by KDD'25

Via

Access Paper or Ask Questions

SODor: Long-Term EEG Partitioning for Seizure Onset Detection

Dec 20, 2024

Zheng Chen, Yasuko Matsubara, Yasushi Sakurai, Jimeng Sun

Figure 1 for SODor: Long-Term EEG Partitioning for Seizure Onset Detection

Figure 2 for SODor: Long-Term EEG Partitioning for Seizure Onset Detection

Figure 3 for SODor: Long-Term EEG Partitioning for Seizure Onset Detection

Figure 4 for SODor: Long-Term EEG Partitioning for Seizure Onset Detection

Abstract:Deep learning models have recently shown great success in classifying epileptic patients using EEG recordings. Unfortunately, classification-based methods lack a sound mechanism to detect the onset of seizure events. In this work, we propose a two-stage framework, \method, that explicitly models seizure onset through a novel task formulation of subsequence clustering. Given an EEG sequence, the framework first learns a set of second-level embeddings with label supervision. It then employs model-based clustering to explicitly capture long-term temporal dependencies in EEG sequences and identify meaningful subsequences. Epochs within a subsequence share a common cluster assignment (normal or seizure), with cluster or state transitions representing successful onset detections. Extensive experiments on three datasets demonstrate that our method can correct misclassifications, achieving 5%-11% classification improvements over other baselines and accurately detecting seizure onsets.

* Accepted at AAAI 2025

Via

Access Paper or Ask Questions

Modeling Latent Non-Linear Dynamical System over Time Series

Dec 12, 2024

Ren Fujiwara, Yasuko Matsubara, Yasushi Sakurai

Abstract:We study the problem of modeling a non-linear dynamical system when given a time series by deriving equations directly from the data. Despite the fact that time series data are given as input, models for dynamics and estimation algorithms that incorporate long-term temporal dependencies are largely absent from existing studies. In this paper, we introduce a latent state to allow time-dependent modeling and formulate this problem as a dynamics estimation problem in latent states. We face multiple technical challenges, including (1) modeling latent non-linear dynamics and (2) solving circular dependencies caused by the presence of latent states. To tackle these challenging problems, we propose a new method, Latent Non-Linear equation modeling (LaNoLem), that can model a latent non-linear dynamical system and a novel alternating minimization algorithm for effectively estimating latent states and model parameters. In addition, we introduce criteria to control model complexity without human intervention. Compared with the state-of-the-art model, LaNoLem achieves competitive performance for estimating dynamics while outperforming other methods in prediction.

* accepted at AAAI'25

Via

Access Paper or Ask Questions

GeSubNet: Gene Interaction Inference for Disease Subtype Network Generation

Oct 17, 2024

Ziwei Yang, Zheng Chen, Xin Liu, Rikuto Kotoge, Peng Chen, Yasuko Matsubara, Yasushi Sakurai, Jimeng Sun

Figure 1 for GeSubNet: Gene Interaction Inference for Disease Subtype Network Generation

Figure 2 for GeSubNet: Gene Interaction Inference for Disease Subtype Network Generation

Figure 3 for GeSubNet: Gene Interaction Inference for Disease Subtype Network Generation

Figure 4 for GeSubNet: Gene Interaction Inference for Disease Subtype Network Generation

Abstract:Retrieving gene functional networks from knowledge databases presents a challenge due to the mismatch between disease networks and subtype-specific variations. Current solutions, including statistical and deep learning methods, often fail to effectively integrate gene interaction knowledge from databases or explicitly learn subtype-specific interactions. To address this mismatch, we propose GeSubNet, which learns a unified representation capable of predicting gene interactions while distinguishing between different disease subtypes. Graphs generated by such representations can be considered subtype-specific networks. GeSubNet is a multi-step representation learning framework with three modules: First, a deep generative model learns distinct disease subtypes from patient gene expression profiles. Second, a graph neural network captures representations of prior gene networks from knowledge databases, ensuring accurate physical gene interactions. Finally, we integrate these two representations using an inference loss that leverages graph generation capabilities, conditioned on the patient separation loss, to refine subtype-specific information in the learned representation. GeSubNet consistently outperforms traditional methods, with average improvements of 30.6%, 21.0%, 20.1%, and 56.6% across four graph evaluation metrics, averaged over four cancer datasets. Particularly, we conduct a biological simulation experiment to assess how the behavior of selected genes from over 11,000 candidates affects subtypes or patient distributions. The results show that the generated network has the potential to identify subtype-specific genes with an 83% likelihood of impacting patient distribution shifts. The GeSubNet resource is available: https://anonymous.4open.science/r/GeSubNet/

* Under review as a conference paper at ICLR 2025

Via

Access Paper or Ask Questions

SplitSEE: A Splittable Self-supervised Framework for Single-Channel EEG Representation Learning

Oct 15, 2024

Rikuto Kotoge, Zheng Chen, Tasuku Kimura, Yasuko Matsubara, Takufumi Yanagisawa, Haruhiko Kishima, Yasushi Sakurai

Abstract:While end-to-end multi-channel electroencephalography (EEG) learning approaches have shown significant promise, their applicability is often constrained in neurological diagnostics, such as intracranial EEG resources. When provided with a single-channel EEG, how can we learn representations that are robust to multi-channels and scalable across varied tasks, such as seizure prediction? In this paper, we present SplitSEE, a structurally splittable framework designed for effective temporal-frequency representation learning in single-channel EEG. The key concept of SplitSEE is a self-supervised framework incorporating a deep clustering task. Given an EEG, we argue that the time and frequency domains are two distinct perspectives, and hence, learned representations should share the same cluster assignment. To this end, we first propose two domain-specific modules that independently learn domain-specific representation and address the temporal-frequency tradeoff issue in conventional spectrogram-based methods. Then, we introduce a novel clustering loss to measure the information similarity. This encourages representations from both domains to coherently describe the same input by assigning them a consistent cluster. SplitSEE leverages a pre-training-to-fine-tuning framework within a splittable architecture and has following properties: (a) Effectiveness: it learns representations solely from single-channel EEG but has even outperformed multi-channel baselines. (b) Robustness: it shows the capacity to adapt across different channels with low performance variance. Superior performance is also achieved with our collected clinical dataset. (c) Scalability: With just one fine-tuning epoch, SplitSEE achieves high and stable performance using partial model layers.

* This paper has been accepted by ICDM2024

Via

Access Paper or Ask Questions

FredNormer: Frequency Domain Normalization for Non-stationary Time Series Forecasting

Oct 06, 2024

Xihao Piao, Zheng Chen, Yushun Dong, Yasuko Matsubara, Yasushi Sakurai

Abstract:Recent normalization-based methods have shown great success in tackling the distribution shift issue, facilitating non-stationary time series forecasting. Since these methods operate in the time domain, they may fail to fully capture the dynamic patterns that are more apparent in the frequency domain, leading to suboptimal results. This paper first theoretically analyzes how normalization methods affect frequency components. We prove that the current normalization methods that operate in the time domain uniformly scale non-zero frequencies, and thus, they struggle to determine components that contribute to more robust forecasting. Therefore, we propose FredNormer, which observes datasets from a frequency perspective and adaptively up-weights the key frequency components. To this end, FredNormer consists of two components: a statistical metric that normalizes the input samples based on their frequency stability and a learnable weighting layer that adjusts stability and introduces sample-specific variations. Notably, FredNormer is a plug-and-play module, which does not compromise the efficiency compared to existing normalization methods. Extensive experiments show that FredNormer improves the averaged MSE of backbone forecasting models by 33.3% and 55.3% on the ETTm2 dataset. Compared to the baseline normalization methods, FredNormer achieves 18 top-1 results and 6 top-2 results out of 28 settings.

Via

Access Paper or Ask Questions

Mining of Switching Sparse Networks for Missing Value Imputation in Multivariate Time Series

Sep 16, 2024

Kohei Obata, Koki Kawabata, Yasuko Matsubara, Yasushi Sakurai

Figure 1 for Mining of Switching Sparse Networks for Missing Value Imputation in Multivariate Time Series

Figure 2 for Mining of Switching Sparse Networks for Missing Value Imputation in Multivariate Time Series

Figure 3 for Mining of Switching Sparse Networks for Missing Value Imputation in Multivariate Time Series

Figure 4 for Mining of Switching Sparse Networks for Missing Value Imputation in Multivariate Time Series

Abstract:Multivariate time series data suffer from the problem of missing values, which hinders the application of many analytical methods. To achieve the accurate imputation of these missing values, exploiting inter-correlation by employing the relationships between sequences (i.e., a network) is as important as the use of temporal dependency, since a sequence normally correlates with other sequences. Moreover, exploiting an adequate network depending on time is also necessary since the network varies over time. However, in real-world scenarios, we normally know neither the network structure nor when the network changes beforehand. Here, we propose a missing value imputation method for multivariate time series, namely MissNet, that is designed to exploit temporal dependency with a state-space model and inter-correlation by switching sparse networks. The network encodes conditional independence between features, which helps us understand the important relationships for imputation visually. Our algorithm, which scales linearly with reference to the length of the data, alternatively infers networks and fills in missing values using the networks while discovering the switching of the networks. Extensive experiments demonstrate that MissNet outperforms the state-of-the-art algorithms for multivariate time series imputation and provides interpretable results.

* Accepted by KDD 2024

Via

Access Paper or Ask Questions