Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

John M. Still

Cryptogenic stroke and migraine: using probabilistic independence and machine learning to uncover latent sources of disease from the electronic health record

Apr 22, 2025

Joshua W. Betts, John M. Still, Thomas A. Lasko

Abstract:Migraine is a common but complex neurological disorder that doubles the lifetime risk of cryptogenic stroke (CS). However, this relationship remains poorly characterized, and few clinical guidelines exist to reduce this associated risk. We therefore propose a data-driven approach to extract probabilistically-independent sources from electronic health record (EHR) data and create a 10-year risk-predictive model for CS in migraine patients. These sources represent external latent variables acting on the causal graph constructed from the EHR data and approximate root causes of CS in our population. A random forest model trained on patient expressions of these sources demonstrated good accuracy (ROC 0.771) and identified the top 10 most predictive sources of CS in migraine patients. These sources revealed that pharmacologic interventions were the most important factor in minimizing CS risk in our population and identified a factor related to allergic rhinitis as a potential causative source of CS in migraine patients.

* 10 pages, 5 figures, 1 table, LaTeX. Submitted as a student paper to the American Medical Informatics Association 2025 Annual Symposium for presentation

Via

Access Paper or Ask Questions

A data-driven approach to discover and quantify systemic lupus erythematosus etiological heterogeneity from electronic health records

Jan 13, 2025

Marco Barbero Mota, John M. Still, Jorge L. Gamboa, Eric V. Strobl, Charles M. Stein, Vivian K. Kawai, Thomas A. Lasko

Abstract:Systemic lupus erythematosus (SLE) is a complex heterogeneous disease with many manifestational facets. We propose a data-driven approach to discover probabilistic independent sources from multimodal imperfect EHR data. These sources represent exogenous variables in the data generation process causal graph that estimate latent root causes of the presence of SLE in the health record. We objectively evaluated the sources against the original variables from which they were discovered by training supervised models to discriminate SLE from negative health records using a reduced set of labelled instances. We found 19 predictive sources with high clinical validity and whose EHR signatures define independent factors of SLE heterogeneity. Using the sources as input patient data representation enables models to provide with rich explanations that better capture the clinical reasons why a particular record is (not) an SLE case. Providers may be willing to trade patient-level interpretability for discrimination especially in challenging cases.

* Received Runner-up Knowledge Discovery and Data Mining Innovation Award at the American Medical Informatics Association Annual Symposium 2024

Via

Access Paper or Ask Questions

Unsupervised Discovery of Clinical Disease Signatures Using Probabilistic Independence

Feb 08, 2024

Thomas A. Lasko, John M. Still, Thomas Z. Li, Marco Barbero Mota, William W. Stead, Eric V. Strobl, Bennett A. Landman, Fabien Maldonado

Abstract:Insufficiently precise diagnosis of clinical disease is likely responsible for many treatment failures, even for common conditions and treatments. With a large enough dataset, it may be possible to use unsupervised machine learning to define clinical disease patterns more precisely. We present an approach to learning these patterns by using probabilistic independence to disentangle the imprint on the medical record of causal latent sources of disease. We inferred a broad set of 2000 clinical signatures of latent sources from 9195 variables in 269,099 Electronic Health Records. The learned signatures produced better discrimination than the original variables in a lung cancer prediction task unknown to the inference algorithm, predicting 3-year malignancy in patients with no history of cancer before a solitary lung nodule was discovered. More importantly, the signatures' greater explanatory power identified pre-nodule signatures of apparently undiagnosed cancer in many of those patients.

* 29 Pages, 8 figures

Via

Access Paper or Ask Questions

Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification

Apr 10, 2023

Thomas Z. Li, John M. Still, Kaiwen Xu, Ho Hin Lee, Leon Y. Cai, Aravind R. Krishnan, Riqiang Gao, Mirza S. Khan, Sanja Antic, Michael Kammer(+4 more)

Figure 1 for Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification

Figure 2 for Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification

Figure 3 for Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification

Figure 4 for Longitudinal Multimodal Transformer Integrating Imaging and Latent Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification

Abstract:The accuracy of predictive models for solitary pulmonary nodule (SPN) diagnosis can be greatly increased by incorporating repeat imaging and medical context, such as electronic health records (EHRs). However, clinically routine modalities such as imaging and diagnostic codes can be asynchronous and irregularly sampled over different time scales which are obstacles to longitudinal multimodal learning. In this work, we propose a transformer-based multimodal strategy to integrate repeat imaging with longitudinal clinical signatures from routinely collected EHRs for SPN classification. We perform unsupervised disentanglement of latent clinical signatures and leverage time-distance scaled self-attention to jointly learn from clinical signatures expressions and chest computed tomography (CT) scans. Our classifier is pretrained on 2,668 scans from a public dataset and 1,149 subjects with longitudinal chest CTs, billing codes, medications, and laboratory tests from EHRs of our home institution. Evaluation on 227 subjects with challenging SPNs revealed a significant AUC improvement over a longitudinal multimodal baseline (0.824 vs 0.752 AUC), as well as improvements over a single cross-section multimodal scenario (0.809 AUC) and a longitudinal imaging-only scenario (0.741 AUC). This work demonstrates significant advantages with a novel approach for co-learning longitudinal imaging and non-imaging phenotypes with transformers.

Via

Access Paper or Ask Questions