Abstract:Deep learning models have shown tremendous potential in learning representations, which are able to capture some key properties of the data. This makes them great candidates for transfer learning: Exploiting commonalities between different learning tasks to transfer knowledge from one task to another. Electronic health records (EHR) research is one of the domains that has witnessed a growing number of deep learning techniques employed for learning clinically-meaningful representations of medical concepts (such as diseases and medications). Despite this growth, the approaches to benchmark and assess such learned representations (or, embeddings) is under-investigated; this can be a big issue when such embeddings are shared to facilitate transfer learning. In this study, we aim to (1) train some of the most prominent disease embedding techniques on a comprehensive EHR data from 3.1 million patients, (2) employ qualitative and quantitative evaluation techniques to assess these embeddings, and (3) provide pre-trained disease embeddings for transfer learning. This study can be the first comprehensive approach for clinical concept embedding evaluation and can be applied to any embedding techniques and for any EHR concept.
Abstract:Today, despite decades of developments in medicine and the growing interest in precision healthcare, vast majority of diagnoses happen once patients begin to show noticeable signs of illness. Early indication and detection of diseases, however, can provide patients and carers with the chance of early intervention, better disease management, and efficient allocation of healthcare resources. The latest developments in machine learning (more specifically, deep learning) provides a great opportunity to address this unmet need. In this study, we introduce BEHRT: A deep neural sequence transduction model for EHR (electronic health records), capable of multitask prediction and disease trajectory mapping. When trained and evaluated on the data from nearly 1.6 million individuals, BEHRT shows a striking absolute improvement of 8.0-10.8%, in terms of Average Precision Score, compared to the existing state-of-the-art deep EHR models (in terms of average precision, when predicting for the onset of 301 conditions). In addition to its superior prediction power, BEHRT provides a personalised view of disease trajectories through its attention mechanism; its flexible architecture enables it to incorporate multiple heterogeneous concepts (e.g., diagnosis, medication, measurements, and more) to improve the accuracy of its predictions; and its (pre-)training results in disease and patient representations that can help us get a step closer to interpretable predictions.
Abstract:Multimorbidity, or the presence of several medical conditions in the same individual, have been increasing in the population both in absolute and relative terms. However, multimorbidity remains poorly understood, and the evidence from existing research to describe its burden, determinants and consequences have been limited. Many of these studies are often cross-sectional and do not explicitly account for multimorbidity patterns' evolution over time. Some studies were based on small datasets, used arbitrary or narrow age range, or lacked appropriate clinical validations. In this study, we applied Non-negative Matrix Factorisation (NMF) in a novel way to one of the largest electronic health records (EHR) databases in the world (with 4 million patients), for simultaneously modelling disease clusters and their role in one's multimorbidity over time. Furthermore, we demonstrated how the temporal characteristics that our model associates with each disease cluster can help mine disease trajectories/networks and generate new hypotheses for the formation of multimorbidity clusters as a function of time/ageing. Our results suggest that our method's ability to learn the underlying dynamics of diseases can provide the field with a novel data-driven / exploratory way of learning the patterns of multimorbidity and their interactions over time.