Abstract:Gaussian Graphical models (GGM) are widely used to estimate the network structures in many applications ranging from biology to finance. In practice, data is often corrupted by latent confounders which biases inference of the underlying true graphical structure. In this paper, we compare and contrast two strategies for inference in graphical models with latent confounders: Gaussian graphical models with latent variables (LVGGM) and PCA-based removal of confounding (PCA+GGM). While these two approaches have similar goals, they are motivated by different assumptions about confounding. In this paper, we explore the connection between these two approaches and propose a new method, which combines the strengths of these two approaches. We prove the consistency and convergence rate for the PCA-based method and use these results to provide guidance about when to use each method. We demonstrate the effectiveness of our methodology using both simulations and in two real-world applications.
Abstract:A key condition for obtaining reliable estimates of the causal effect of a treatment is overlap (a.k.a. positivity): the distributions of the features used to perform causal adjustment cannot be too different in the treated and control groups. In cases where overlap is poor, causal effect estimators can become brittle, especially when they incorporate weighting. To address this problem, a number of proposals (including confounder selection or dimension reduction methods) incorporate feature representations to induce better overlap between the treated and control groups. A key concern in these proposals is that the representation may introduce confounding bias into the effect estimator. In this paper, we introduce deconfounding scores, which are feature representations that induce better overlap without biasing the target of estimation. We show that deconfounding scores satisfy a zero-covariance condition that is identifiable in observed data. As a proof of concept, we characterize a family of deconfounding scores in a simplified setting with Gaussian covariates, and show that in some simple simulations, these scores can be used to construct estimators with good finite-sample properties. In particular, we show that this technique could be an attractive alternative to standard regularizations that are often applied to IPW and balancing weights.