Abstract:We propose a novel regression adjustment method designed for estimating distributional treatment effect parameters in randomized experiments. Randomized experiments have been extensively used to estimate treatment effects in various scientific fields. However, to gain deeper insights, it is essential to estimate distributional treatment effects rather than relying solely on average effects. Our approach incorporates pre-treatment covariates into a distributional regression framework, utilizing machine learning techniques to improve the precision of distributional treatment effect estimators. The proposed approach can be readily implemented with off-the-shelf machine learning methods and remains valid as long as the nuisance components are reasonably well estimated. Also, we establish the asymptotic properties of the proposed estimator and present a uniformly valid inference method. Through simulation results and real data analysis, we demonstrate the effectiveness of integrating machine learning techniques in reducing the variance of distributional treatment effect estimators in finite samples.
Abstract:We address the issue of binary classification from positive and unlabeled data (PU classification) with a selection bias in the positive data. During the observation process, (i) a sample is exposed to a user, (ii) the user then returns the label for the exposed sample, and (iii) we however can only observe the positive samples. Therefore, the positive labels that we observe are a combination of both the exposure and the labeling, which creates a selection bias problem for the observed positive samples. This scenario represents a conceptual framework for many practical applications, such as recommender systems, which we refer to as ``learning from positive, unlabeled, and exposure data'' (PUE classification). To tackle this problem, we initially assume access to data with exposure labels. Then, we propose a method to identify the function of interest using a strong ignorability assumption and develop an ``Automatic Debiased PUE'' (ADPUE) learning method. This algorithm directly debiases the selection bias without requiring intermediate estimates, such as the propensity score, which is necessary for other learning methods. Through experiments, we demonstrate that our approach outperforms traditional PU learning methods on various semi-synthetic datasets.
Abstract:We consider learning causal relationships under conditional moment conditions. Unlike causal inference under unconditional moment conditions, conditional moment conditions pose serious challenges for causal inference, especially in complex, high-dimensional settings. To address this issue, we propose a method that transforms conditional moment conditions to unconditional moment conditions through importance weighting using the conditional density ratio. Then, using this transformation, we propose a method that successfully approximates conditional moment conditions. Our proposed approach allows us to employ methods for estimating causal parameters from unconditional moment conditions, such as generalized method of moments, adequately in a straightforward manner. In experiments, we confirm that our proposed method performs well compared to existing methods.
Abstract:Off-policy evaluation (OPE) is the problem of estimating the value of a target policy from samples obtained via different policies. Recently, applying OPE methods for bandit problems has garnered attention. For the theoretical guarantees of an estimator of the policy value, the OPE methods require various conditions on the target policy and policy used for generating the samples. However, existing studies did not carefully discuss the practical situation where such conditions hold, and the gap between them remains. This paper aims to show new results for bridging the gap. Based on the properties of the evaluation policy, we categorize OPE situations. Then, among practical applications, we mainly discuss the best policy selection. For the situation, we propose a meta-algorithm based on existing OPE estimators. We investigate the proposed concepts using synthetic and open real-world datasets in experiments.
Abstract:We consider training a binary classifier under delayed feedback (DF Learning). In DF Learning, we first receive negative samples; subsequently, some samples turn positive. This problem is conceivable in various real-world applications such as online advertisements, where the user action takes place long after the first click. Owing to the delayed feedback, simply separating the positive and negative data causes a sample selection bias. One solution is to assume that a long time window after first observing a sample reduces the sample selection bias. However, existing studies report that only using a portion of all samples based on the time window assumption yields suboptimal performance, and the use of all samples along with the time window assumption improves empirical performance. Extending these existing studies, we propose a method with an unbiased and convex empirical risk constructed from the whole samples under the time window assumption. We provide experimental results to demonstrate the effectiveness of the proposed method using a real traffic log dataset.
Abstract:We consider the evaluation and training of a new policy for the evaluation data by using the historical data obtained from a different policy. The goal of off-policy evaluation (OPE) is to estimate the expected reward of a new policy over the evaluation data, and that of off-policy learning (OPL) is to find a new policy that maximizes the expected reward over the evaluation data. Although the standard OPE and OPL assume the same distribution of covariate between the historical and evaluation data, there often exists a problem of a covariate shift, i.e., the distribution of the covariate of the historical data is different from that of the evaluation data. In this paper, we derive the efficiency bound of OPE under a covariate shift. Then, we propose doubly robust and efficient estimators for OPE and OPL under a covariate shift by using an estimator of the density ratio between the distributions of the historical and evaluation data. We also discuss other possible estimators and compare their theoretical properties. Finally, we confirm the effectiveness of the proposed estimators through experiments.
Abstract:We develop a method for predicting the performance of reinforcement learning and bandit algorithms, given historical data that may have been generated by a different algorithm. Our estimator has the property that its prediction converges in probability to the true performance of a counterfactual algorithm at the fast $\sqrt{N}$ rate, as the sample size $N$ increases. We also show a correct way to estimate the variance of our prediction, thus allowing the analyst to quantify the uncertainty in the prediction. These properties hold even when the analyst does not know which among a large number of potentially important state variables are really important. These theoretical guarantees make our estimator safe to use. We finally apply it to improve advertisement design by a major advertisement company. We find that our method produces smaller mean squared errors than state-of-the-art methods.
Abstract:In display advertising, predicting the conversion rate, that is, the probability that a user takes a predefined action on an advertiser's website, such as purchasing goods is fundamental in estimating the value of displaying the advertisement. However, there is a relatively long time delay between a click and its resultant conversion. Because of the delayed feedback, some positive instances at the training period are labeled as negative because some conversions have not yet occurred when training data are gathered. As a result, the conditional label distributions differ between the training data and the production environment. This situation is referred to as a feedback shift. We address this problem by using an importance weight approach typically used for covariate shift correction. We prove its consistency for the feedback shift. Results in both offline and online experiments show that our proposed method outperforms the existing method.
Abstract:In display advertising, predicting the conversion rate, that is, the probability that a user takes a predefined action on an advertiser's website is fundamental in estimating the value of showing a user an advertisement. There are two troublesome difficulties in the conversion rate prediction due to the delayed feedback. First, some positive labels are not correctly observed in training data, because some conversions do not occur right after clicking the ads. Moreover, the delay mechanism is not uniform among instances; some positive feedback is much more frequently observed than the others. It is widely acknowledged that these problems cause a severe bias in the naive empirical average loss function for the conversion rate prediction. To overcome the challenges, we propose two unbiased estimators, one for the conversion rate prediction, and the other for the bias estimation. Subsequently, we propose an interactive learning algorithm named {\em Dual Learning Algorithm for Delayed Feedback (DLA-DF)} where a conversion rate predictor and a bias estimator are learned alternately. The proposed algorithm is the first of its kind to address the two major challenges in a theoretically principal way. Lastly, we conducted a simulation experiment to demonstrate that the proposed method outperforms the existing baselines and validate that the unbiased estimation approach is suitable for the delayed feedback problem.
Abstract:What is the most effective way to select the best causal model among potential candidates? In this paper, we propose a method to effectively select the best individual-level treatment effect (ITE) predictors from a set of candidates using only an observational validation set. In model selection or hyperparameter tuning, we are interested in choosing the best model or the value of hyperparameter from potential candidates. Thus, we focus on accurately preserving the rank order of the ITE prediction performance of candidate causal models. The proposed evaluation metric is theoretically proved to preserve the true ranking of the model performance in expectation and to minimize the upper bound of the finite sample uncertainty in model selection. Consistent with the theoretical result, empirical experiments demonstrate that our proposed method is more likely to select the best model and set of hyperparameter in both model selection and hyperparameter tuning.