Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qingyuan Zhao

A Graphical Approach to State Variable Selection in Off-policy Learning

Jan 01, 2025

Joakim Blach Andersen, Qingyuan Zhao

Figure 1 for A Graphical Approach to State Variable Selection in Off-policy Learning

Figure 2 for A Graphical Approach to State Variable Selection in Off-policy Learning

Figure 3 for A Graphical Approach to State Variable Selection in Off-policy Learning

Figure 4 for A Graphical Approach to State Variable Selection in Off-policy Learning

Abstract:Sequential decision problems are widely studied across many areas of science. A key challenge when learning policies from historical data - a practice commonly referred to as off-policy learning - is how to ``identify'' the impact of a policy of interest when the observed data are not randomized. Off-policy learning has mainly been studied in two settings: dynamic treatment regimes (DTRs), where the focus is on controlling confounding in medical problems with short decision horizons, and offline reinforcement learning (RL), where the focus is on dimension reduction in closed systems such as games. The gap between these two well studied settings has limited the wider application of off-policy learning to many real-world problems. Using the theory for causal inference based on acyclic directed mixed graph (ADMGs), we provide a set of graphical identification criteria in general decision processes that encompass both DTRs and MDPs. We discuss how our results relate to the often implicit causal assumptions made in the DTR and RL literatures and further clarify several common misconceptions. Finally, we present a realistic simulation study for the dynamic pricing problem encountered in container logistics, and demonstrate how violations of our graphical criteria can lead to suboptimal policies.

* 25 pages (not including appendix and references), 10 figures, 2 tables

Via

Access Paper or Ask Questions

Counterfactual explainability of black-box prediction models

Nov 03, 2024

Zijun Gao, Qingyuan Zhao

Abstract:It is crucial to be able to explain black-box prediction models to use them effectively and safely in practice. Most existing tools for model explanations are associational rather than causal, and we use two paradoxical examples to show that such explanations are generally inadequate. Motivated by the concept of genetic heritability in twin studies, we propose a new notion called counterfactual explainability for black-box prediction models. Counterfactual explainability has three key advantages: (1) it leverages counterfactual outcomes and extends methods for global sensitivity analysis (such as functional analysis of variance and Sobol's indices) to a causal setting; (2) it is defined not only for the totality of a set of input factors but also for their interactions (indeed, it is a probability measure on a whole ``explanation algebra''); (3) it also applies to dependent input factors whose causal relationship can be modeled by a directed acyclic graph, thus incorporating causal mechanisms into the explanation.

* 19 pages, 3 figures

Via

Access Paper or Ask Questions

Forward and Backward State Abstractions for Off-policy Evaluation

Jun 27, 2024

Meiling Hao, Pingfan Su, Liyuan Hu, Zoltan Szabo, Qingyuan Zhao, Chengchun Shi

Abstract:Off-policy evaluation (OPE) is crucial for evaluating a target policy's impact offline before its deployment. However, achieving accurate OPE in large state spaces remains challenging.This paper studies state abstractions-originally designed for policy learning-in the context of OPE. Our contributions are three-fold: (i) We define a set of irrelevance conditions central to learning state abstractions for OPE. (ii) We derive sufficient conditions for achieving irrelevance in Q-functions and marginalized importance sampling ratios, the latter obtained by constructing a time-reversed Markov decision process (MDP) based on the observed MDP. (iii) We propose a novel two-step procedure that sequentially projects the original state space into a smaller space, which substantially simplify the sample complexity of OPE arising from high cardinality.

* 42 pages, 5 figures

Via

Access Paper or Ask Questions