Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Manuel Burger

Data-Driven Discovery of Feature Groups in Clinical Time Series

Nov 11, 2025

Fedor Sergeev, Manuel Burger, Polina Leshetkina, Vincent Fortuin, Gunnar Rätsch, Rita Kuznetsova

Abstract:Clinical time series data are critical for patient monitoring and predictive modeling. These time series are typically multivariate and often comprise hundreds of heterogeneous features from different data sources. The grouping of features based on similarity and relevance to the prediction task has been shown to enhance the performance of deep learning architectures. However, defining these groups a priori using only semantic knowledge is challenging, even for domain experts. To address this, we propose a novel method that learns feature groups by clustering weights of feature-wise embedding layers. This approach seamlessly integrates into standard supervised training and discovers the groups that directly improve downstream performance on clinically relevant tasks. We demonstrate that our method outperforms static clustering approaches on synthetic data and achieves performance comparable to expert-defined groups on real-world medical data. Moreover, the learned feature groups are clinically interpretable, enabling data-driven discovery of task-relevant relationships between variables.

* Machine Learning for Health (ML4H) 2025 in Proceedings of Machine Learning Research 297

Via

Access Paper or Ask Questions

Domain Generalization and Adaptation in Intensive Care with Anchor Regression

Jul 29, 2025

Malte Londschien, Manuel Burger, Gunnar Rätsch, Peter Bühlmann

Figure 1 for Domain Generalization and Adaptation in Intensive Care with Anchor Regression

Figure 2 for Domain Generalization and Adaptation in Intensive Care with Anchor Regression

Figure 3 for Domain Generalization and Adaptation in Intensive Care with Anchor Regression

Figure 4 for Domain Generalization and Adaptation in Intensive Care with Anchor Regression

Abstract:The performance of predictive models in clinical settings often degrades when deployed in new hospitals due to distribution shifts. This paper presents a large-scale study of causality-inspired domain generalization on heterogeneous multi-center intensive care unit (ICU) data. We apply anchor regression and introduce anchor boosting, a novel, tree-based nonlinear extension, to a large dataset comprising 400,000 patients from nine distinct ICU databases. The anchor regularization consistently improves out-of-distribution performance, particularly for the most dissimilar target domains. The methods appear robust to violations of theoretical assumptions, such as anchor exogeneity. Furthermore, we propose a novel conceptual framework to quantify the utility of large external data datasets. By evaluating performance as a function of available target-domain data, we identify three regimes: (i) a domain generalization regime, where only the external model should be used, (ii) a domain adaptation regime, where refitting the external model is optimal, and (iii) a data-rich regime, where external data provides no additional value.

Via

Access Paper or Ask Questions

Towards Foundation Models for Critical Care Time Series

Nov 25, 2024

Manuel Burger, Fedor Sergeev, Malte Londschien, Daphné Chopard, Hugo Yèche, Eike Gerdes, Polina Leshetkina, Alexander Morgenroth, Zeynep Babür, Jasmina Bogojeska(+3 more)

Figure 1 for Towards Foundation Models for Critical Care Time Series

Figure 2 for Towards Foundation Models for Critical Care Time Series

Figure 3 for Towards Foundation Models for Critical Care Time Series

Figure 4 for Towards Foundation Models for Critical Care Time Series

Abstract:Notable progress has been made in generalist medical large language models across various healthcare areas. However, large-scale modeling of in-hospital time series data - such as vital signs, lab results, and treatments in critical care - remains underexplored. Existing datasets are relatively small, but combining them can enhance patient diversity and improve model robustness. To effectively utilize these combined datasets for large-scale modeling, it is essential to address the distribution shifts caused by varying treatment policies, necessitating the harmonization of treatment variables across the different datasets. This work aims to establish a foundation for training large-scale multi-variate time series models on critical care data and to provide a benchmark for machine learning models in transfer learning across hospitals to study and address distribution shift challenges. We introduce a harmonized dataset for sequence modeling and transfer learning research, representing the first large-scale collection to include core treatment variables. Future plans involve expanding this dataset to support further advancements in transfer learning and the development of scalable, generalizable models for critical healthcare applications.

* Accepted for Oral Presentation at AIM-FM Workshop at NeurIPS 2024

Via

Access Paper or Ask Questions

Multi-Modal Contrastive Learning for Online Clinical Time-Series Applications

Mar 27, 2024

Fabian Baldenweg, Manuel Burger, Gunnar Rätsch, Rita Kuznetsova

Figure 1 for Multi-Modal Contrastive Learning for Online Clinical Time-Series Applications

Figure 2 for Multi-Modal Contrastive Learning for Online Clinical Time-Series Applications

Figure 3 for Multi-Modal Contrastive Learning for Online Clinical Time-Series Applications

Figure 4 for Multi-Modal Contrastive Learning for Online Clinical Time-Series Applications

Abstract:Electronic Health Record (EHR) datasets from Intensive Care Units (ICU) contain a diverse set of data modalities. While prior works have successfully leveraged multiple modalities in supervised settings, we apply advanced self-supervised multi-modal contrastive learning techniques to ICU data, specifically focusing on clinical notes and time-series for clinically relevant online prediction tasks. We introduce a loss function Multi-Modal Neighborhood Contrastive Loss (MM-NCL), a soft neighborhood function, and showcase the excellent linear probe and zero-shot performance of our approach.

* Accepted as a Workshop Paper at TS4H@ICLR2024

Via

Access Paper or Ask Questions

Dynamic Survival Analysis for Early Event Prediction

Mar 19, 2024

Hugo Yèche, Manuel Burger, Dinara Veshchezerova, Gunnar Rätsch

Abstract:This study advances Early Event Prediction (EEP) in healthcare through Dynamic Survival Analysis (DSA), offering a novel approach by integrating risk localization into alarm policies to enhance clinical event metrics. By adapting and evaluating DSA models against traditional EEP benchmarks, our research demonstrates their ability to match EEP models on a time-step level and significantly improve event-level metrics through a new alarm prioritization scheme (up to 11% AuPRC difference). This approach represents a significant step forward in predictive healthcare, providing a more nuanced and actionable framework for early event prediction and management.

Via

Access Paper or Ask Questions

Learning Genomic Sequence Representations using Graph Neural Networks over De Bruijn Graphs

Dec 06, 2023

Kacper Kapuśniak, Manuel Burger, Gunnar Rätsch, Amir Joudaki

Figure 1 for Learning Genomic Sequence Representations using Graph Neural Networks over De Bruijn Graphs

Figure 2 for Learning Genomic Sequence Representations using Graph Neural Networks over De Bruijn Graphs

Figure 3 for Learning Genomic Sequence Representations using Graph Neural Networks over De Bruijn Graphs

Figure 4 for Learning Genomic Sequence Representations using Graph Neural Networks over De Bruijn Graphs

Abstract:The rapid expansion of genomic sequence data calls for new methods to achieve robust sequence representations. Existing techniques often neglect intricate structural details, emphasizing mainly contextual information. To address this, we developed k-mer embeddings that merge contextual and structural string information by enhancing De Bruijn graphs with structural similarity connections. Subsequently, we crafted a self-supervised method based on Contrastive Learning that employs a heterogeneous Graph Convolutional Network encoder and constructs positive pairs based on node similarities. Our embeddings consistently outperform prior techniques for Edit Distance Approximation and Closest String Retrieval tasks.

* Poster at "NeurIPS 2023 New Frontiers in Graph Learning Workshop (NeurIPS GLFrontiers 2023)"

Via

Access Paper or Ask Questions

On the Importance of Step-wise Embeddings for Heterogeneous Clinical Time-Series

Nov 15, 2023

Rita Kuznetsova, Alizée Pace, Manuel Burger, Hugo Yèche, Gunnar Rätsch

Figure 1 for On the Importance of Step-wise Embeddings for Heterogeneous Clinical Time-Series

Figure 2 for On the Importance of Step-wise Embeddings for Heterogeneous Clinical Time-Series

Figure 3 for On the Importance of Step-wise Embeddings for Heterogeneous Clinical Time-Series

Figure 4 for On the Importance of Step-wise Embeddings for Heterogeneous Clinical Time-Series

Abstract:Recent advances in deep learning architectures for sequence modeling have not fully transferred to tasks handling time-series from electronic health records. In particular, in problems related to the Intensive Care Unit (ICU), the state-of-the-art remains to tackle sequence classification in a tabular manner with tree-based methods. Recent findings in deep learning for tabular data are now surpassing these classical methods by better handling the severe heterogeneity of data input features. Given the similar level of feature heterogeneity exhibited by ICU time-series and motivated by these findings, we explore these novel methods' impact on clinical sequence modeling tasks. By jointly using such advances in deep learning for tabular data, our primary objective is to underscore the importance of step-wise embeddings in time-series modeling, which remain unexplored in machine learning methods for clinical data. On a variety of clinically relevant tasks from two large-scale ICU datasets, MIMIC-III and HiRID, our work provides an exhaustive analysis of state-of-the-art methods for tabular time-series as time-step embedding models, showing overall performance improvement. In particular, we evidence the importance of feature grouping in clinical time-series, with significant performance gains when considering features within predefined semantic groups in the step-wise embedding module.

* Machine Learning for Health (ML4H) 2023 in Proceedings of Machine Learning Research 225

Via

Access Paper or Ask Questions

Knowledge Graph Representations to enhance Intensive Care Time-Series Predictions

Nov 13, 2023

Samyak Jain, Manuel Burger, Gunnar Rätsch, Rita Kuznetsova

Figure 1 for Knowledge Graph Representations to enhance Intensive Care Time-Series Predictions

Figure 2 for Knowledge Graph Representations to enhance Intensive Care Time-Series Predictions

Figure 3 for Knowledge Graph Representations to enhance Intensive Care Time-Series Predictions

Figure 4 for Knowledge Graph Representations to enhance Intensive Care Time-Series Predictions

Abstract:Intensive Care Units (ICU) require comprehensive patient data integration for enhanced clinical outcome predictions, crucial for assessing patient conditions. Recent deep learning advances have utilized patient time series data, and fusion models have incorporated unstructured clinical reports, improving predictive performance. However, integrating established medical knowledge into these models has not yet been explored. The medical domain's data, rich in structural relationships, can be harnessed through knowledge graphs derived from clinical ontologies like the Unified Medical Language System (UMLS) for better predictions. Our proposed methodology integrates this knowledge with ICU data, improving clinical decision modeling. It combines graph representations with vital signs and clinical reports, enhancing performance, especially when data is missing. Additionally, our model includes an interpretability component to understand how knowledge graph nodes affect predictions.

* Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10th, 2023, New Orleans, United States, 11 pages

Via

Access Paper or Ask Questions

Language Model Training Paradigms for Clinical Feature Embeddings

Nov 01, 2023

Yurong Hu, Manuel Burger, Gunnar Rätsch, Rita Kuznetsova

Figure 1 for Language Model Training Paradigms for Clinical Feature Embeddings

Figure 2 for Language Model Training Paradigms for Clinical Feature Embeddings

Figure 3 for Language Model Training Paradigms for Clinical Feature Embeddings

Figure 4 for Language Model Training Paradigms for Clinical Feature Embeddings

Abstract:In research areas with scarce data, representation learning plays a significant role. This work aims to enhance representation learning for clinical time series by deriving universal embeddings for clinical features, such as heart rate and blood pressure. We use self-supervised training paradigms for language models to learn high-quality clinical feature embeddings, achieving a finer granularity than existing time-step and patient-level representation learning. We visualize the learnt embeddings via unsupervised dimension reduction techniques and observe a high degree of consistency with prior clinical knowledge. We also evaluate the model performance on the MIMIC-III benchmark and demonstrate the effectiveness of using clinical feature embeddings. We publish our code online for replication.

* Poster at "NeurIPS 2023 Workshop: Self-Supervised Learning - Theory and Practice"

Via

Access Paper or Ask Questions

Multi-modal Graph Learning over UMLS Knowledge Graphs

Jul 10, 2023

Manuel Burger, Gunnar Rätsch, Rita Kuznetsova

Figure 1 for Multi-modal Graph Learning over UMLS Knowledge Graphs

Figure 2 for Multi-modal Graph Learning over UMLS Knowledge Graphs

Figure 3 for Multi-modal Graph Learning over UMLS Knowledge Graphs

Figure 4 for Multi-modal Graph Learning over UMLS Knowledge Graphs

Abstract:Clinicians are increasingly looking towards machine learning to gain insights about patient evolutions. We propose a novel approach named Multi-Modal UMLS Graph Learning (MMUGL) for learning meaningful representations of medical concepts using graph neural networks over knowledge graphs based on the unified medical language system. These representations are aggregated to represent entire patient visits and then fed into a sequence model to perform predictions at the granularity of multiple hospital visits of a patient. We improve performance by incorporating prior medical knowledge and considering multiple modalities. We compare our method to existing architectures proposed to learn representations at different granularities on the MIMIC-III dataset and show that our approach outperforms these methods. The results demonstrate the significance of multi-modal medical concept representations based on prior medical knowledge.

Via

Access Paper or Ask Questions