Abstract:Latent variable models (LVMs) are probabilistic models where some of the variables are hidden during training. A broad class of LVMshave a directed acyclic graphical structure. The directed structure suggests an intuitive causal explanation of the data generating process. For example, a latent topic model suggests that topics cause the occurrence of a token. Despite this intuitive causal interpretation, a directed acyclic latent variable model trained on data is generally insufficient for causal reasoning, as the required model parameters may not be uniquely identified. In this manuscript we demonstrate that an LVM can answer any causal query posed post-training, provided that the query can be identified from the observed variables according to the do-calculus rules. We show that causal reasoning can enhance a broad class of LVM long established in the probabilistic modeling community, and demonstrate its effectiveness on several case studies. These include a machine learning model with multiple causes where there exists a set of latent confounders and a mediator between the causes and the outcome variable, a study where the identifiable causal query cannot be estimated using the front-door or back-door criterion, a case study that captures unobserved crosstalk between two biological signaling pathways, and a COVID-19 expert system that identifies multiple causal queries.
Abstract:Counterfactual inference is a useful tool for comparing outcomes of interventions on complex systems. It requires us to represent the system in form of a structural causal model, complete with a causal diagram, probabilistic assumptions on exogenous variables, and functional assignments. Specifying such models can be extremely difficult in practice. The process requires substantial domain expertise, and does not scale easily to large systems, multiple systems, or novel system modifications. At the same time, many application domains, such as molecular biology, are rich in structured causal knowledge that is qualitative in nature. This manuscript proposes a general approach for querying a causal biological knowledge graph, and converting the qualitative result into a quantitative structural causal model that can learn from data to answer the question. We demonstrate the feasibility, accuracy and versatility of this approach using two case studies in systems biology. The first demonstrates the appropriateness of the underlying assumptions and the accuracy of the results. The second demonstrates the versatility of the approach by querying a knowledge base for the molecular determinants of a severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-induced cytokine storm, and performing counterfactual inference to estimate the causal effect of medical countermeasures for severely ill patients.