Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marius Fracarolli

Embedding-Space Data Augmentation to Prevent Membership Inference Attacks in Clinical Time Series Forecasting

Nov 07, 2025

Marius Fracarolli, Michael Staniek, Stefan Riezler

Figure 1 for Embedding-Space Data Augmentation to Prevent Membership Inference Attacks in Clinical Time Series Forecasting

Figure 2 for Embedding-Space Data Augmentation to Prevent Membership Inference Attacks in Clinical Time Series Forecasting

Figure 3 for Embedding-Space Data Augmentation to Prevent Membership Inference Attacks in Clinical Time Series Forecasting

Figure 4 for Embedding-Space Data Augmentation to Prevent Membership Inference Attacks in Clinical Time Series Forecasting

Abstract:Balancing strong privacy guarantees with high predictive performance is critical for time series forecasting (TSF) tasks involving Electronic Health Records (EHR). In this study, we explore how data augmentation can mitigate Membership Inference Attacks (MIA) on TSF models. We show that retraining with synthetic data can substantially reduce the effectiveness of loss-based MIAs by reducing the attacker's true-positive to false-positive ratio. The key challenge is generating synthetic samples that closely resemble the original training data to confuse the attacker, while also introducing enough novelty to enhance the model's ability to generalize to unseen data. We examine multiple augmentation strategies - Zeroth-Order Optimization (ZOO), a variant of ZOO constrained by Principal Component Analysis (ZOO-PCA), and MixUp - to strengthen model resilience without sacrificing accuracy. Our experimental results show that ZOO-PCA yields the best reductions in TPR/FPR ratio for MIA attacks without sacrificing performance on test data.

* Accepted as a proceedings paper at Machine Learning for Health (ML4H) symposium 2025, December 1-2, 2025, San Diego, United States, 15 pages

Via

Access Paper or Ask Questions

Early Prediction of Causes (not Effects) in Healthcare by Long-Term Clinical Time Series Forecasting

Aug 07, 2024

Michael Staniek, Marius Fracarolli, Michael Hagmann, Stefan Riezler

Abstract:Machine learning for early syndrome diagnosis aims to solve the intricate task of predicting a ground truth label that most often is the outcome (effect) of a medical consensus definition applied to observed clinical measurements (causes), given clinical measurements observed several hours before. Instead of focusing on the prediction of the future effect, we propose to directly predict the causes via time series forecasting (TSF) of clinical variables and determine the effect by applying the gold standard consensus definition to the forecasted values. This method has the invaluable advantage of being straightforwardly interpretable to clinical practitioners, and because model training does not rely on a particular label anymore, the forecasted data can be used to predict any consensus-based label. We exemplify our method by means of long-term TSF with Transformer models, with a focus on accurate prediction of sparse clinical variables involved in the SOFA-based Sepsis-3 definition and the new Simplified Acute Physiology Score (SAPS-II) definition. Our experiments are conducted on two datasets and show that contrary to recent proposals which advocate set function encoders for time series and direct multi-step decoders, best results are achieved by a combination of standard dense encoders with iterative multi-step decoders. The key for success of iterative multi-step decoding can be attributed to its ability to capture cross-variate dependencies and to a student forcing training strategy that teaches the model to rely on its own previous time step predictions for the next time step prediction.

Via

Access Paper or Ask Questions