Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shishir Rao

A Transformer-based survival model for prediction of all-cause mortality in heart failure patients: a multi-cohort study

Mar 16, 2025

Shishir Rao, Nouman Ahmed, Gholamreza Salimi-Khorshidi, Christopher Yau, Huimin Su, Nathalie Conrad, Folkert W Asselbergs, Mark Woodward, Rod Jackson, John GF Cleland(+1 more)

Figure 1 for A Transformer-based survival model for prediction of all-cause mortality in heart failure patients: a multi-cohort study

Figure 2 for A Transformer-based survival model for prediction of all-cause mortality in heart failure patients: a multi-cohort study

Figure 3 for A Transformer-based survival model for prediction of all-cause mortality in heart failure patients: a multi-cohort study

Figure 4 for A Transformer-based survival model for prediction of all-cause mortality in heart failure patients: a multi-cohort study

Abstract:We developed and validated TRisk, a Transformer-based AI model predicting 36-month mortality in heart failure patients by analysing temporal patient journeys from UK electronic health records (EHR). Our study included 403,534 heart failure patients (ages 40-90) from 1,418 English general practices, with 1,063 practices for model derivation and 355 for external validation. TRisk was compared against the MAGGIC-EHR model across various patient subgroups. With median follow-up of 9 months, TRisk achieved a concordance index of 0.845 (95% confidence interval: [0.841, 0.849]), significantly outperforming MAGGIC-EHR's 0.728 (0.723, 0.733) for predicting 36-month all-cause mortality. TRisk showed more consistent performance across sex, age, and baseline characteristics, suggesting less bias. We successfully adapted TRisk to US hospital data through transfer learning, achieving a C-index of 0.802 (0.789, 0.816) with 21,767 patients. Explainability analyses revealed TRisk captured established risk factors while identifying underappreciated predictors like cancers and hepatic failure that were important across both cohorts. Notably, cancers maintained strong prognostic value even a decade after diagnosis. TRisk demonstrated well-calibrated mortality prediction across both healthcare systems. Our findings highlight the value of tracking longitudinal health profiles and revealed risk factors not included in previous expert-driven models.

Via

Access Paper or Ask Questions

Clinical outcome prediction under hypothetical interventions -- a representation learning framework for counterfactual reasoning

May 15, 2022

Yikuan Li, Mohammad Mamouei, Shishir Rao, Abdelaali Hassaine, Dexter Canoy, Thomas Lukasiewicz, Kazem Rahimi, Gholamreza Salimi-Khorshidi

Figure 1 for Clinical outcome prediction under hypothetical interventions -- a representation learning framework for counterfactual reasoning

Figure 2 for Clinical outcome prediction under hypothetical interventions -- a representation learning framework for counterfactual reasoning

Figure 3 for Clinical outcome prediction under hypothetical interventions -- a representation learning framework for counterfactual reasoning

Figure 4 for Clinical outcome prediction under hypothetical interventions -- a representation learning framework for counterfactual reasoning

Abstract:Most machine learning (ML) models are developed for prediction only; offering no option for causal interpretation of their predictions or parameters/properties. This can hamper the health systems' ability to employ ML models in clinical decision-making processes, where the need and desire for predicting outcomes under hypothetical investigations (i.e., counterfactual reasoning/explanation) is high. In this research, we introduce a new representation learning framework (i.e., partial concept bottleneck), which considers the provision of counterfactual explanations as an embedded property of the risk model. Despite architectural changes necessary for jointly optimising for prediction accuracy and counterfactual reasoning, the accuracy of our approach is comparable to prediction-only models. Our results suggest that our proposed framework has the potential to help researchers and clinicians improve personalised care (e.g., by investigating the hypothetical differential effects of interventions)

Via

Access Paper or Ask Questions

Targeted-BEHRT: Deep learning for observational causal inference on longitudinal electronic health records

Feb 07, 2022

Shishir Rao, Mohammad Mamouei, Gholamreza Salimi-Khorshidi, Yikuan Li, Rema Ramakrishnan, Abdelaali Hassaine, Dexter Canoy, Kazem Rahimi

Figure 1 for Targeted-BEHRT: Deep learning for observational causal inference on longitudinal electronic health records

Figure 2 for Targeted-BEHRT: Deep learning for observational causal inference on longitudinal electronic health records

Figure 3 for Targeted-BEHRT: Deep learning for observational causal inference on longitudinal electronic health records

Figure 4 for Targeted-BEHRT: Deep learning for observational causal inference on longitudinal electronic health records

Abstract:Observational causal inference is useful for decision making in medicine when randomized clinical trials (RCT) are infeasible or non generalizable. However, traditional approaches fail to deliver unconfounded causal conclusions in practice. The rise of "doubly robust" non-parametric tools coupled with the growth of deep learning for capturing rich representations of multimodal data, offers a unique opportunity to develop and test such models for causal inference on comprehensive electronic health records (EHR). In this paper, we investigate causal modelling of an RCT-established null causal association: the effect of antihypertensive use on incident cancer risk. We develop a dataset for our observational study and a Transformer-based model, Targeted BEHRT coupled with doubly robust estimation, we estimate average risk ratio (RR). We compare our model to benchmark statistical and deep learning models for causal inference in multiple experiments on semi-synthetic derivations of our dataset with various types and intensities of confounding. In order to further test the reliability of our approach, we test our model on situations of limited data. We find that our model provides more accurate estimates of RR (least sum absolute error from ground truth) compared to benchmarks for risk ratio estimation on high-dimensional EHR across experiments. Finally, we apply our model to investigate the original case study: antihypertensives' effect on cancer and demonstrate that our model generally captures the validated null association.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

Transfer Learning in Electronic Health Records through Clinical Concept Embedding

Jul 27, 2021

Jose Roberto Ayala Solares, Yajie Zhu, Abdelaali Hassaine, Shishir Rao, Yikuan Li, Mohammad Mamouei, Dexter Canoy, Kazem Rahimi, Gholamreza Salimi-Khorshidi

Figure 1 for Transfer Learning in Electronic Health Records through Clinical Concept Embedding

Figure 2 for Transfer Learning in Electronic Health Records through Clinical Concept Embedding

Figure 3 for Transfer Learning in Electronic Health Records through Clinical Concept Embedding

Figure 4 for Transfer Learning in Electronic Health Records through Clinical Concept Embedding

Abstract:Deep learning models have shown tremendous potential in learning representations, which are able to capture some key properties of the data. This makes them great candidates for transfer learning: Exploiting commonalities between different learning tasks to transfer knowledge from one task to another. Electronic health records (EHR) research is one of the domains that has witnessed a growing number of deep learning techniques employed for learning clinically-meaningful representations of medical concepts (such as diseases and medications). Despite this growth, the approaches to benchmark and assess such learned representations (or, embeddings) is under-investigated; this can be a big issue when such embeddings are shared to facilitate transfer learning. In this study, we aim to (1) train some of the most prominent disease embedding techniques on a comprehensive EHR data from 3.1 million patients, (2) employ qualitative and quantitative evaluation techniques to assess these embeddings, and (3) provide pre-trained disease embeddings for transfer learning. This study can be the first comprehensive approach for clinical concept embedding evaluation and can be applied to any embedding techniques and for any EHR concept.

Via

Access Paper or Ask Questions

Hi-BEHRT: Hierarchical Transformer-based model for accurate prediction of clinical events using multimodal longitudinal electronic health records

Jun 21, 2021

Yikuan Li, Mohammad Mamouei, Gholamreza Salimi-Khorshidi, Shishir Rao, Abdelaali Hassaine, Dexter Canoy, Thomas Lukasiewicz, Kazem Rahimi

Figure 1 for Hi-BEHRT: Hierarchical Transformer-based model for accurate prediction of clinical events using multimodal longitudinal electronic health records

Figure 2 for Hi-BEHRT: Hierarchical Transformer-based model for accurate prediction of clinical events using multimodal longitudinal electronic health records

Figure 3 for Hi-BEHRT: Hierarchical Transformer-based model for accurate prediction of clinical events using multimodal longitudinal electronic health records

Figure 4 for Hi-BEHRT: Hierarchical Transformer-based model for accurate prediction of clinical events using multimodal longitudinal electronic health records

Abstract:Electronic health records represent a holistic overview of patients' trajectories. Their increasing availability has fueled new hopes to leverage them and develop accurate risk prediction models for a wide range of diseases. Given the complex interrelationships of medical records and patient outcomes, deep learning models have shown clear merits in achieving this goal. However, a key limitation of these models remains their capacity in processing long sequences. Capturing the whole history of medical encounters is expected to lead to more accurate predictions, but the inclusion of records collected for decades and from multiple resources can inevitably exceed the receptive field of the existing deep learning architectures. This can result in missing crucial, long-term dependencies. To address this gap, we present Hi-BEHRT, a hierarchical Transformer-based model that can significantly expand the receptive field of Transformers and extract associations from much longer sequences. Using a multimodal large-scale linked longitudinal electronic health records, the Hi-BEHRT exceeds the state-of-the-art BEHRT 1% to 5% for area under the receiver operating characteristic (AUROC) curve and 3% to 6% for area under the precision recall (AUPRC) curve on average, and 3% to 6% (AUROC) and 3% to 11% (AUPRC) for patients with long medical history for 5-year heart failure, diabetes, chronic kidney disease, and stroke risk prediction. Additionally, because pretraining for hierarchical Transformer is not well-established, we provide an effective end-to-end contrastive pre-training strategy for Hi-BEHRT using EHR, improving its transferability on predicting clinical events with relatively small training dataset.

Via

Access Paper or Ask Questions

Risk factor identification for incident heart failure using neural network distillation and variable selection

Mar 01, 2021

Yikuan Li, Shishir Rao, Mohammad Mamouei, Gholamreza Salimi-Khorshidi, Dexter Canoy, Abdelaali Hassaine, Thomas Lukasiewicz, Kazem Rahimi

Figure 1 for Risk factor identification for incident heart failure using neural network distillation and variable selection

Figure 2 for Risk factor identification for incident heart failure using neural network distillation and variable selection

Figure 3 for Risk factor identification for incident heart failure using neural network distillation and variable selection

Figure 4 for Risk factor identification for incident heart failure using neural network distillation and variable selection

Abstract:Recent evidence shows that deep learning models trained on electronic health records from millions of patients can deliver substantially more accurate predictions of risk compared to their statistical counterparts. While this provides an important opportunity for improving clinical decision-making, the lack of interpretability is a major barrier to the incorporation of these black-box models in routine care, limiting their trustworthiness and preventing further hypothesis-testing investigations. In this study, we propose two methods, namely, model distillation and variable selection, to untangle hidden patterns learned by an established deep learning model (BEHRT) for risk association identification. Due to the clinical importance and diversity of heart failure as a phenotype, it was used to showcase the merits of the proposed methods. A cohort with 788,880 (8.3% incident heart failure) patients was considered for the study. Model distillation identified 598 and 379 diseases that were associated and dissociated with heart failure at the population level, respectively. While the associations were broadly consistent with prior knowledge, our method also highlighted several less appreciated links that are worth further investigation. In addition to these important population-level insights, we developed an approach to individual-level interpretation to take account of varying manifestation of heart failure in clinical practice. This was achieved through variable selection by detecting a minimal set of encounters that can maximally preserve the accuracy of prediction for individuals. Our proposed work provides a discovery-enabling tool to identify risk factors in both population and individual levels from a data-driven perspective. This helps to generate new hypotheses and guides further investigations on causal links.

Via

Access Paper or Ask Questions

An explainable Transformer-based deep learning model for the prediction of incident heart failure

Jan 27, 2021

Shishir Rao, Yikuan Li, Rema Ramakrishnan, Abdelaali Hassaine, Dexter Canoy, John Cleland, Thomas Lukasiewicz, Gholamreza Salimi-Khorshidi, Kazem Rahimi

Figure 1 for An explainable Transformer-based deep learning model for the prediction of incident heart failure

Figure 2 for An explainable Transformer-based deep learning model for the prediction of incident heart failure

Figure 3 for An explainable Transformer-based deep learning model for the prediction of incident heart failure

Figure 4 for An explainable Transformer-based deep learning model for the prediction of incident heart failure

Abstract:Predicting the incidence of complex chronic conditions such as heart failure is challenging. Deep learning models applied to rich electronic health records may improve prediction but remain unexplainable hampering their wider use in medical practice. We developed a novel Transformer deep-learning model for more accurate and yet explainable prediction of incident heart failure involving 100,071 patients from longitudinal linked electronic health records across the UK. On internal 5-fold cross validation and held-out external validation, our model achieved 0.93 and 0.93 area under the receiver operator curve and 0.69 and 0.70 area under the precision-recall curve, respectively and outperformed existing deep learning models. Predictor groups included all community and hospital diagnoses and medications contextualised within the age and calendar year for each patient's clinical encounter. The importance of contextualised medical information was revealed in a number of sensitivity analyses, and our perturbation method provided a way of identifying factors contributing to risk. Many of the identified risk factors were consistent with existing knowledge from clinical and epidemiological research but several new associations were revealed which had not been considered in expert-driven risk prediction models.

Via

Access Paper or Ask Questions

Deep Bayesian Gaussian Processes for Uncertainty Estimation in Electronic Health Records

Mar 23, 2020

Yikuan Li, Shishir Rao, Abdelaali Hassaine, Rema Ramakrishnan, Yajie Zhu, Dexter Canoy, Gholamreza Salimi-Khorshidi, Thomas Lukasiewicz, Kazem Rahimi

Figure 1 for Deep Bayesian Gaussian Processes for Uncertainty Estimation in Electronic Health Records

Figure 2 for Deep Bayesian Gaussian Processes for Uncertainty Estimation in Electronic Health Records

Figure 3 for Deep Bayesian Gaussian Processes for Uncertainty Estimation in Electronic Health Records

Figure 4 for Deep Bayesian Gaussian Processes for Uncertainty Estimation in Electronic Health Records

Abstract:One major impediment to the wider use of deep learning for clinical decision making is the difficulty of assigning a level of confidence to model predictions. Currently, deep Bayesian neural networks and sparse Gaussian processes are the main two scalable uncertainty estimation methods. However, deep Bayesian neural network suffers from lack of expressiveness, and more expressive models such as deep kernel learning, which is an extension of sparse Gaussian process, captures only the uncertainty from the higher level latent space. Therefore, the deep learning model under it lacks interpretability and ignores uncertainty from the raw data. In this paper, we merge features of the deep Bayesian learning framework with deep kernel learning to leverage the strengths of both methods for more comprehensive uncertainty estimation. Through a series of experiments on predicting the first incidence of heart failure, diabetes and depression applied to large-scale electronic medical records, we demonstrate that our method is better at capturing uncertainty than both Gaussian processes and deep Bayesian neural networks in terms of indicating data insufficiency and distinguishing true positive and false positive predictions, with a comparable generalisation performance. Furthermore, by assessing the accuracy and area under the receiver operating characteristic curve over the predictive probability, we show that our method is less susceptible to making overconfident predictions, especially for the minority class in imbalanced datasets. Finally, we demonstrate how uncertainty information derived by the model can inform risk factor analysis towards model interpretability.

* 21 pages

Via

Access Paper or Ask Questions

BEHRT: Transformer for Electronic Health Records

Jul 22, 2019

Yikuan Li, Shishir Rao, Jose Roberto Ayala Solares, Abdelaali Hassaine, Dexter Canoy, Yajie Zhu, Kazem Rahimi, Gholamreza Salimi-Khorshidi

Figure 1 for BEHRT: Transformer for Electronic Health Records

Figure 2 for BEHRT: Transformer for Electronic Health Records

Figure 3 for BEHRT: Transformer for Electronic Health Records

Figure 4 for BEHRT: Transformer for Electronic Health Records

Abstract:Today, despite decades of developments in medicine and the growing interest in precision healthcare, vast majority of diagnoses happen once patients begin to show noticeable signs of illness. Early indication and detection of diseases, however, can provide patients and carers with the chance of early intervention, better disease management, and efficient allocation of healthcare resources. The latest developments in machine learning (more specifically, deep learning) provides a great opportunity to address this unmet need. In this study, we introduce BEHRT: A deep neural sequence transduction model for EHR (electronic health records), capable of multitask prediction and disease trajectory mapping. When trained and evaluated on the data from nearly 1.6 million individuals, BEHRT shows a striking absolute improvement of 8.0-10.8%, in terms of Average Precision Score, compared to the existing state-of-the-art deep EHR models (in terms of average precision, when predicting for the onset of 301 conditions). In addition to its superior prediction power, BEHRT provides a personalised view of disease trajectories through its attention mechanism; its flexible architecture enables it to incorporate multiple heterogeneous concepts (e.g., diagnosis, medication, measurements, and more) to improve the accuracy of its predictions; and its (pre-)training results in disease and patient representations that can help us get a step closer to interpretable predictions.

* Shishir Rao and Yikuan Li have equally contributed to this work as first authors. Shishir Rao is Corresponding Author for this work

Via

Access Paper or Ask Questions

Learning Multimorbidity Patterns from Electronic Health Records Using Non-negative Matrix Factorisation

Jul 19, 2019

Abdelaali Hassaine, Dexter Canoy, Jose Roberto Ayala Solares, Yajie Zhu, Shishir Rao, Yikuan Li, Kazem Rahimi, Gholamreza Salimi-Khorshidi

Figure 1 for Learning Multimorbidity Patterns from Electronic Health Records Using Non-negative Matrix Factorisation

Figure 2 for Learning Multimorbidity Patterns from Electronic Health Records Using Non-negative Matrix Factorisation

Figure 3 for Learning Multimorbidity Patterns from Electronic Health Records Using Non-negative Matrix Factorisation

Figure 4 for Learning Multimorbidity Patterns from Electronic Health Records Using Non-negative Matrix Factorisation

Abstract:Multimorbidity, or the presence of several medical conditions in the same individual, have been increasing in the population both in absolute and relative terms. However, multimorbidity remains poorly understood, and the evidence from existing research to describe its burden, determinants and consequences have been limited. Many of these studies are often cross-sectional and do not explicitly account for multimorbidity patterns' evolution over time. Some studies were based on small datasets, used arbitrary or narrow age range, or lacked appropriate clinical validations. In this study, we applied Non-negative Matrix Factorisation (NMF) in a novel way to one of the largest electronic health records (EHR) databases in the world (with 4 million patients), for simultaneously modelling disease clusters and their role in one's multimorbidity over time. Furthermore, we demonstrated how the temporal characteristics that our model associates with each disease cluster can help mine disease trajectories/networks and generate new hypotheses for the formation of multimorbidity clusters as a function of time/ageing. Our results suggest that our method's ability to learn the underlying dynamics of diseases can provide the field with a novel data-driven / exploratory way of learning the patterns of multimorbidity and their interactions over time.

Via

Access Paper or Ask Questions