Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Liyuan Xu

Density Ratio-based Proxy Causal Learning Without Density Ratios

Mar 11, 2025

Bariscan Bozkurt, Ben Deaner, Dimitri Meunier, Liyuan Xu, Arthur Gretton

Abstract:We address the setting of Proxy Causal Learning (PCL), which has the goal of estimating causal effects from observed data in the presence of hidden confounding. Proxy methods accomplish this task using two proxy variables related to the latent confounder: a treatment proxy (related to the treatment) and an outcome proxy (related to the outcome). Two approaches have been proposed to perform causal effect estimation given proxy variables; however only one of these has found mainstream acceptance, since the other was understood to require density ratio estimation - a challenging task in high dimensions. In the present work, we propose a practical and effective implementation of the second approach, which bypasses explicit density ratio estimation and is suitable for continuous and high-dimensional treatments. We employ kernel ridge regression to derive estimators, resulting in simple closed-form solutions for dose-response and conditional dose-response curves, along with consistency guarantees. Our methods empirically demonstrate superior or comparable performance to existing frameworks on synthetic and real-world datasets.

* AISTATS 2025 accepted, 81 pages

Via

Access Paper or Ask Questions

Kernel Single Proxy Control for Deterministic Confounding

Aug 08, 2023

Liyuan Xu, Arthur Gretton

Abstract:We consider the problem of causal effect estimation with an unobserved confounder, where we observe a proxy variable that is associated with the confounder. Although Proxy Causal Learning (PCL) uses two proxy variables to recover the true causal effect, we show that a single proxy variable is sufficient for causal estimation if the outcome is generated deterministically, generalizing Control Outcome Calibration Approach (COCA). We propose two kernel-based methods for this setting: the first based on the two-stage regression approach, and the second based on a maximum moment restriction approach. We prove that both approaches can consistently estimate the causal effect, and we empirically demonstrate that we can successfully recover the causal effect on a synthetic dataset.

Via

Access Paper or Ask Questions

A Neural Mean Embedding Approach for Back-door and Front-door Adjustment

Oct 12, 2022

Liyuan Xu, Arthur Gretton

Figure 1 for A Neural Mean Embedding Approach for Back-door and Front-door Adjustment

Figure 2 for A Neural Mean Embedding Approach for Back-door and Front-door Adjustment

Figure 3 for A Neural Mean Embedding Approach for Back-door and Front-door Adjustment

Figure 4 for A Neural Mean Embedding Approach for Back-door and Front-door Adjustment

Abstract:We consider the estimation of average and counterfactual treatment effects, under two settings: back-door adjustment and front-door adjustment. The goal in both cases is to recover the treatment effect without having an access to a hidden confounder. This objective is attained by first estimating the conditional mean of the desired outcome variable given relevant covariates (the "first stage" regression), and then taking the (conditional) expectation of this function as a "second stage" procedure. We propose to compute these conditional expectations directly using a regression function to the learned input features of the first stage, thus avoiding the need for sampling or density estimation. All functions and features (and in particular, the output features in the second stage) are neural networks learned adaptively from data, with the sole requirement that the final layer of the first stage should be linear. The proposed method is shown to converge to the true causal parameter, and outperforms the recent state-of-the-art methods on challenging causal benchmarks, including settings involving high-dimensional image data.

Via

Access Paper or Ask Questions

Importance Weighting Approach in Kernel Bayes' Rule

Feb 05, 2022

Liyuan Xu, Yutian Chen, Arnaud Doucet, Arthur Gretton

Figure 1 for Importance Weighting Approach in Kernel Bayes' Rule

Figure 2 for Importance Weighting Approach in Kernel Bayes' Rule

Figure 3 for Importance Weighting Approach in Kernel Bayes' Rule

Abstract:We study a nonparametric approach to Bayesian computation via feature means, where the expectation of prior features is updated to yield expected posterior features, based on regression from kernel or neural net features of the observations. All quantities involved in the Bayesian update are learned from observed data, making the method entirely model-free. The resulting algorithm is a novel instance of a kernel Bayes' rule (KBR). Our approach is based on importance weighting, which results in superior numerical stability to the existing approach to KBR, which requires operator inversion. We show the convergence of the estimator using a novel consistency analysis on the importance weighting estimator in the infinity norm. We evaluate our KBR on challenging synthetic benchmarks, including a filtering problem with a state-space model involving high dimensional image observations. The proposed method yields uniformly better empirical performance than the existing KBR, and competitive performance with other competing methods.

Via

Access Paper or Ask Questions

Kernel Methods for Multistage Causal Inference: Mediation Analysis and Dynamic Treatment Effects

Nov 06, 2021

Rahul Singh, Liyuan Xu, Arthur Gretton

Figure 1 for Kernel Methods for Multistage Causal Inference: Mediation Analysis and Dynamic Treatment Effects

Figure 2 for Kernel Methods for Multistage Causal Inference: Mediation Analysis and Dynamic Treatment Effects

Figure 3 for Kernel Methods for Multistage Causal Inference: Mediation Analysis and Dynamic Treatment Effects

Figure 4 for Kernel Methods for Multistage Causal Inference: Mediation Analysis and Dynamic Treatment Effects

Abstract:We propose kernel ridge regression estimators for mediation analysis and dynamic treatment effects over short horizons. We allow treatments, covariates, and mediators to be discrete or continuous, and low, high, or infinite dimensional. We propose estimators of means, increments, and distributions of counterfactual outcomes with closed form solutions in terms of kernel matrix operations. For the continuous treatment case, we prove uniform consistency with finite sample rates. For the discrete treatment case, we prove root-n consistency, Gaussian approximation, and semiparametric efficiency. We conduct simulations then estimate mediated and dynamic treatment effects of the US Job Corps program for disadvantaged youth.

* 66 pages. Material in this draft previously appeared in a working paper presented at the 2020 NeurIPS Workshop on ML for Economic Policy (arXiv:2010.04855v1). We have divided the original working paper (arXiv:2010.04855v1) into two projects: one paper focusing on static settings (arXiv:2010.04855) and this paper focusing on dynamic settings

Via

Access Paper or Ask Questions

Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation

Jun 07, 2021

Liyuan Xu, Heishiro Kanagawa, Arthur Gretton

Figure 1 for Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation

Figure 2 for Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation

Figure 3 for Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation

Figure 4 for Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation

Abstract:Proxy causal learning (PCL) is a method for estimating the causal effect of treatments on outcomes in the presence of unobserved confounding, using proxies (structured side information) for the confounder. This is achieved via two-stage regression: in the first stage, we model relations among the treatment and proxies; in the second stage, we use this model to learn the effect of treatment on the outcome, given the context provided by the proxies. PCL guarantees recovery of the true causal effect, subject to identifiability conditions. We propose a novel method for PCL, the deep feature proxy variable method (DFPV), to address the case where the proxies, treatments, and outcomes are high-dimensional and have nonlinear complex relationships, as represented by deep neural network features. We show that DFPV outperforms recent state-of-the-art PCL methods on challenging synthetic benchmarks, including settings involving high dimensional image data. Furthermore, we show that PCL can be applied to off-policy evaluation for the confounded bandit problem, in which DFPV also exhibits competitive performance.

Via

Access Paper or Ask Questions

On Instrumental Variable Regression for Deep Offline Policy Evaluation

May 21, 2021

Yutian Chen, Liyuan Xu, Caglar Gulcehre, Tom Le Paine, Arthur Gretton, Nando de Freitas, Arnaud Doucet

Figure 1 for On Instrumental Variable Regression for Deep Offline Policy Evaluation

Figure 2 for On Instrumental Variable Regression for Deep Offline Policy Evaluation

Figure 3 for On Instrumental Variable Regression for Deep Offline Policy Evaluation

Figure 4 for On Instrumental Variable Regression for Deep Offline Policy Evaluation

Abstract:We show that the popular reinforcement learning (RL) strategy of estimating the state-action value (Q-function) by minimizing the mean squared Bellman error leads to a regression problem with confounding, the inputs and output noise being correlated. Hence, direct minimization of the Bellman error can result in significantly biased Q-function estimates. We explain why fixing the target Q-network in Deep Q-Networks and Fitted Q Evaluation provides a way of overcoming this confounding, thus shedding new light on this popular but not well understood trick in the deep RL literature. An alternative approach to address confounding is to leverage techniques developed in the causality literature, notably instrumental variables (IV). We bring together here the literature on IV and RL by investigating whether IV approaches can lead to improved Q-function estimates. This paper analyzes and compares a wide range of recent IV methods in the context of offline policy evaluation (OPE), where the goal is to estimate the value of a policy using logged data only. By applying different IV techniques to OPE, we are not only able to recover previously proposed OPE methods such as model-based techniques but also to obtain competitive new techniques. We find empirically that state-of-the-art OPE methods are closely matched in performance by some IV methods such as AGMM, which were not developed for OPE. We open-source all our code and datasets at https://github.com/liyuan9988/IVOPEwithACME.

Via

Access Paper or Ask Questions

Learning Deep Features in Instrumental Variable Regression

Nov 01, 2020

Liyuan Xu, Yutian Chen, Siddarth Srinivasan, Nando de Freitas, Arnaud Doucet, Arthur Gretton

Figure 1 for Learning Deep Features in Instrumental Variable Regression

Figure 2 for Learning Deep Features in Instrumental Variable Regression

Figure 3 for Learning Deep Features in Instrumental Variable Regression

Figure 4 for Learning Deep Features in Instrumental Variable Regression

Abstract:Instrumental variable (IV) regression is a standard strategy for learning causal relationships between confounded treatment and outcome variables from observational data by utilizing an instrumental variable, which affects the outcome only through the treatment. In classical IV regression, learning proceeds in two stages: stage 1 performs linear regression from the instrument to the treatment; and stage 2 performs linear regression from the treatment to the outcome, conditioned on the instrument. We propose a novel method, deep feature instrumental variable regression (DFIV), to address the case where relations between instruments, treatments, and outcomes may be nonlinear. In this case, deep neural nets are trained to define informative nonlinear features on the instruments and treatments. We propose an alternating training regime for these features to ensure good end-to-end performance when composing stages 1 and 2, thus obtaining highly flexible feature maps in a computationally efficient manner. DFIV outperforms recent state-of-the-art methods on challenging IV benchmarks, including settings involving high dimensional image data. DFIV also exhibits competitive performance in off-policy policy evaluation for reinforcement learning, which can be understood as an IV regression task.

Via

Access Paper or Ask Questions

Kernel Methods for Policy Evaluation: Treatment Effects, Mediation Analysis, and Off-Policy Planning

Oct 13, 2020

Rahul Singh, Liyuan Xu, Arthur Gretton

Figure 1 for Kernel Methods for Policy Evaluation: Treatment Effects, Mediation Analysis, and Off-Policy Planning

Figure 2 for Kernel Methods for Policy Evaluation: Treatment Effects, Mediation Analysis, and Off-Policy Planning

Figure 3 for Kernel Methods for Policy Evaluation: Treatment Effects, Mediation Analysis, and Off-Policy Planning

Figure 4 for Kernel Methods for Policy Evaluation: Treatment Effects, Mediation Analysis, and Off-Policy Planning

Abstract:We propose a novel framework for non-parametric policy evaluation in static and dynamic settings. Under the assumption of selection on observables, we consider treatment effects of the population, of sub-populations, and of alternative populations that may have alternative covariate distributions. We further consider the decomposition of a total effect into a direct effect and an indirect effect (as mediated by a particular mechanism). Under the assumption of sequential selection on observables, we consider the effects of sequences of treatments. Across settings, we allow for treatments that may be discrete, continuous, or even text. Across settings, we allow for estimation of not only counterfactual mean outcomes but also counterfactual distributions of outcomes. We unify analyses across settings by showing that all of these causal learning problems reduce to the re-weighting of a prediction, i.e. causal adjustment. We implement the re-weighting as an inner product in a function space called a reproducing kernel Hilbert space (RKHS), with a closed form solution that can be computed in one line of code. We prove uniform consistency and provide finite sample rates of convergence. We evaluate our estimators in simulations devised by other authors. We use our new estimators to evaluate continuous and heterogeneous treatment effects of the US Jobs Corps training program for disadvantaged youth.

* 66 pages, 6 figures

Via

Access Paper or Ask Questions

Similarity-based Classification: Connecting Similarity Learning to Binary Classification

Jun 11, 2020

Han Bao, Takuya Shimada, Liyuan Xu, Issei Sato, Masashi Sugiyama

Figure 1 for Similarity-based Classification: Connecting Similarity Learning to Binary Classification

Figure 2 for Similarity-based Classification: Connecting Similarity Learning to Binary Classification

Figure 3 for Similarity-based Classification: Connecting Similarity Learning to Binary Classification

Figure 4 for Similarity-based Classification: Connecting Similarity Learning to Binary Classification

Abstract:In real-world classification problems, pairwise supervision (i.e., a pair of patterns with a binary label indicating whether they belong to the same class or not) can often be obtained at a lower cost than ordinary class labels. Similarity learning is a general framework to utilize such pairwise supervision to elicit useful representations by inferring the relationship between two data points, which encompasses various important preprocessing tasks such as metric learning, kernel learning, graph embedding, and contrastive representation learning. Although elicited representations are expected to perform well in downstream tasks such as classification, little theoretical insight has been given in the literature so far. In this paper, we reveal that a specific formulation of similarity learning is strongly related to the objective of binary classification, which spurs us to learn a binary classifier without ordinary class labels---by fitting the product of real-valued prediction functions of pairwise patterns to their similarity. Our formulation of similarity learning does not only generalize many existing ones, but also admits an excess risk bound showing an explicit connection to classification. Finally, we empirically demonstrate the practical usefulness of the proposed method on benchmark datasets.

* 22 pages

Via

Access Paper or Ask Questions