Abstract:We study the problem of detecting adverse drug events in electronic healthcare records. The challenge in this work is to aggregate heterogeneous data types involving diagnosis codes, drug codes, as well as lab measurements. An earlier framework proposed for the same problem demonstrated promising predictive performance for the random forest classifier by using only lab measurements as data features. We extend this framework, by additionally including diagnosis and drug prescription codes, concurrently. In addition, we employ a recursive feature selection mechanism on top, that extracts the top-k most important features. Our experimental evaluation on five medical datasets of adverse drug events and six different classifiers, suggests that the integration of these additional features provides substantial and statistically significant improvements in terms of AUC, while employing medically relevant features.
Abstract:The increased adoption of Electronic Health Records(EHRs) has brought changes to the way the patient care is carried out. The rich heterogeneous and temporal data space stored in EHRs can be leveraged by machine learning models to capture the underlying information and make clinically relevant predictions. This can be exploited to support public health activities such as pharmacovigilance and specifically mitigate the public health issue of Adverse Drug Events(ADEs). The aim of this article is, therefore, to investigate the various ways of handling temporal data for the purpose of detecting ADEs. Based on a review of the existing literature, 11 articles from the last 10 years were chosen to be studied. According to the literature retrieved the main methods were found to fall into 5 different approaches: based on temporal abstraction, graph-based, learning weights and data tables containing time series of different length. To that end, EHRs are a valuable source that has led current research to the automatic detection of ADEs. Yet there still exists a great deal of challenges that concerns the exploitation of the heterogeneous, data types with temporal information included in EHRs for predicting ADEs.