Abstract:Discovering causal effects is at the core of scientific investigation but remains challenging when only observational data is available. In practice, causal networks are difficult to learn and interpret, and limited to relatively small datasets. We report a more reliable and scalable causal discovery method (iMIIC), based on a general mutual information supremum principle, which greatly improves the precision of inferred causal relations while distinguishing genuine causes from putative and latent causal effects. We showcase iMIIC on synthetic and real-life healthcare data from 396,179 breast cancer patients from the US Surveillance, Epidemiology, and End Results program. More than 90\% of predicted causal effects appear correct, while the remaining unexpected direct and indirect causal effects can be interpreted in terms of diagnostic procedures, therapeutic timing, patient preference or socio-economic disparity. iMIIC's unique capabilities open up new avenues to discover reliable and interpretable causal networks across a range of research fields.
Abstract:While the emergence of large administrative claims data provides opportunities for research, their use remains limited by the lack of clinical annotations relevant to disease outcomes, such as recurrence in breast cancer (BC). Several challenges arise from the annotation of such endpoints in administrative claims, including the need to infer both the occurrence and the date of the recurrence, the right-censoring of data, or the importance of time intervals between medical visits. Deep learning approaches have been successfully used to label temporal medical sequences, but no method is currently able to handle simultaneously right-censoring and visit temporality to detect survival events in medical sequences. We propose EDEN (Event DEtection Network), a time-aware Long-Short-Term-Memory network for survival analyses, and its custom loss function. Our method outperforms several state-of-the-art approaches on real-world BC datasets. EDEN constitutes a powerful tool to annotate disease recurrence from administrative claims, thus paving the way for the massive use of such data in BC research.