Abstract:Cardiotoxicity induced by the breast cancer treatments (i.e., chemotherapy, targeted therapy and radiation therapy) is a significant problem for breast cancer patients. The cardiotoxicity risk for breast cancer patients receiving different treatments remains unclear. We developed and evaluated risk predictive models for cardiotoxicity in breast cancer patients using EHR data. The AUC scores to predict the CHF, CAD, CM and MI are 0.846, 0.857, 0.858 and 0.804 respectively. After adjusting for baseline differences in cardiovascular health, patients who received chemotherapy or targeted therapy appeared to have higher risk of cardiotoxicity than patients who received radiation therapy. Due to differences in baseline cardiac health across the different breast cancer treatment groups, caution is recommended in interpreting the cardiotoxic effect of these treatments.
Abstract:Causal inference is a powerful statistical methodology for explanatory analysis and individualized treatment effect (ITE) estimation, a prominent causal inference task that has become a fundamental research problem. ITE estimation, when performed naively, tends to produce biased estimates. To obtain unbiased estimates, counterfactual information is needed, which is not directly observable from data. Based on mature domain knowledge, reliable traditional methods to estimate ITE exist. In recent years, neural networks have been widely used in clinical studies. Specifically, recurrent neural networks (RNN) have been applied to temporal Electronic Health Records (EHR) data analysis. However, RNNs are not guaranteed to automatically discover causal knowledge, correctly estimate counterfactual information, and thus correctly estimate the ITE. This lack of correct ITE estimates can hinder the performance of the model. In this work we study whether RNNs can be guided to correctly incorporate ITE-related knowledge and whether this improves predictive performance. Specifically, we first describe a Causal-Temporal Structure for temporal EHR data; then based on this structure, we estimate sequential ITE along the timeline, using sequential Propensity Score Matching (PSM); and finally, we propose a knowledge-guided neural network methodology to incorporate estimated ITE. We demonstrate on real-world and synthetic data (where the actual ITEs are known) that the proposed methodology can significantly improve the prediction performance of RNN.
Abstract:Our aging population increasingly suffers from multiple chronic diseases simultaneously, necessitating the comprehensive treatment of these conditions. Finding the optimal set of drugs for a combinatorial set of diseases is a combinatorial pattern exploration problem. Association rule mining is a popular tool for such problems, but the requirement of health care for finding causal, rather than associative, patterns renders association rule mining unsuitable. To address this issue, we propose a novel framework based on the Rubin-Neyman causal model for extracting causal rules from observational data, correcting for a number of common biases. Specifically, given a set of interventions and a set of items that define subpopulations (e.g., diseases), we wish to find all subpopulations in which effective intervention combinations exist and in each such subpopulation, we wish to find all intervention combinations such that dropping any intervention from this combination will reduce the efficacy of the treatment. A key aspect of our framework is the concept of closed intervention sets which extend the concept of quantifying the effect of a single intervention to a set of concurrent interventions. We also evaluated our causal rule mining framework on the Electronic Health Records (EHR) data of a large cohort of patients from Mayo Clinic and showed that the patterns we extracted are sufficiently rich to explain the controversial findings in the medical literature regarding the effect of a class of cholesterol drugs on Type-II Diabetes Mellitus (T2DM).