Abstract:Pandemic control measures like lock-down, restrictions on restaurants and gatherings, social-distancing have shown to be effective in curtailing the spread of COVID-19. However, their sustained enforcement has negative economic effects. To craft strategies and policies that reduce the hardship on the people and the economy while being effective against the pandemic, authorities need to understand the disease dynamics at the right geo-spatial granularity. Considering factors like the hospitals' ability to handle the fluctuating demands, evaluating various reopening scenarios, and accurate forecasting of cases are vital to decision making. Towards this end, we present a flexible end-to-end solution that seamlessly integrates public health data with tertiary client data to accurately estimate the risk of reopening a community. At its core lies a state-of-the-art prediction model that auto-captures changing trends in transmission and mobility. Benchmarking against various published baselines confirm the superiority of our forecasting algorithm. Combined with the ability to extend to multiple client-specific requirements and perform deductive reasoning through counter-factual analysis, this solution provides actionable insights to multiple client domains ranging from government to educational institutions, hospitals, and commercial establishments.
Abstract:Electronic medical records (EMR) contain longitudinal information about patients that can be used to analyze outcomes. Typically, studies on EMR data have worked with established variables that have already been acknowledged to be associated with certain outcomes. However, EMR data may also contain hitherto unrecognized factors for risk association and prediction of outcomes for a disease. In this paper, we present a scalable data-driven framework to analyze EMR data corpus in a disease agnostic way that systematically uncovers important factors influencing outcomes in patients, as supported by data and without expert guidance. We validate the importance of such factors by using the framework to predict for the relevant outcomes. Specifically, we analyze EMR data covering approximately 47 million unique patients to characterize renal failure (RF) among type 2 diabetic (T2DM) patients. We propose a specialized L1 regularized Cox Proportional Hazards (CoxPH) survival model to identify the important factors from those available from patient encounter history. To validate the identified factors, we use a specialized generalized linear model (GLM) to predict the probability of renal failure for individual patients within a specified time window. Our experiments indicate that the factors identified via our data-driven method overlap with the patient characteristics recognized by experts. Our approach allows for scalable, repeatable and efficient utilization of data available in EMRs, confirms prior medical knowledge and can generate new hypothesis without expert supervision.