Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shota Yasui

On Efficient Estimation of Distributional Treatment Effects under Covariate-Adaptive Randomization

Jun 06, 2025

Undral Byambadalai, Tomu Hirata, Tatsushi Oka, Shota Yasui

Abstract:This paper focuses on the estimation of distributional treatment effects in randomized experiments that use covariate-adaptive randomization (CAR). These include designs such as Efron's biased-coin design and stratified block randomization, where participants are first grouped into strata based on baseline covariates and assigned treatments within each stratum to ensure balance across groups. In practice, datasets often contain additional covariates beyond the strata indicators. We propose a flexible distribution regression framework that leverages off-the-shelf machine learning methods to incorporate these additional covariates, enhancing the precision of distributional treatment effect estimates. We establish the asymptotic distribution of the proposed estimator and introduce a valid inference procedure. Furthermore, we derive the semiparametric efficiency bound for distributional treatment effects under CAR and demonstrate that our regression-adjusted estimator attains this bound. Simulation studies and empirical analyses of microcredit programs highlight the practical advantages of our method.

* Proceedings of the International Conference on Machine Learning, 2025

Via

Access Paper or Ask Questions

Estimating Distributional Treatment Effects in Randomized Experiments: Machine Learning for Variance Reduction

Jul 22, 2024

Undral Byambadalai, Tatsushi Oka, Shota Yasui

Abstract:We propose a novel regression adjustment method designed for estimating distributional treatment effect parameters in randomized experiments. Randomized experiments have been extensively used to estimate treatment effects in various scientific fields. However, to gain deeper insights, it is essential to estimate distributional treatment effects rather than relying solely on average effects. Our approach incorporates pre-treatment covariates into a distributional regression framework, utilizing machine learning techniques to improve the precision of distributional treatment effect estimators. The proposed approach can be readily implemented with off-the-shelf machine learning methods and remains valid as long as the nuisance components are reasonably well estimated. Also, we establish the asymptotic properties of the proposed estimator and present a uniformly valid inference method. Through simulation results and real data analysis, we demonstrate the effectiveness of integrating machine learning techniques in reducing the variance of distributional treatment effect estimators in finite samples.

Via

Access Paper or Ask Questions

Automatic Debiased Learning from Positive, Unlabeled, and Exposure Data

Mar 08, 2023

Masahiro Kato, Shuting Wu, Kodai Kureishi, Shota Yasui

Figure 1 for Automatic Debiased Learning from Positive, Unlabeled, and Exposure Data

Figure 2 for Automatic Debiased Learning from Positive, Unlabeled, and Exposure Data

Figure 3 for Automatic Debiased Learning from Positive, Unlabeled, and Exposure Data

Figure 4 for Automatic Debiased Learning from Positive, Unlabeled, and Exposure Data

Abstract:We address the issue of binary classification from positive and unlabeled data (PU classification) with a selection bias in the positive data. During the observation process, (i) a sample is exposed to a user, (ii) the user then returns the label for the exposed sample, and (iii) we however can only observe the positive samples. Therefore, the positive labels that we observe are a combination of both the exposure and the labeling, which creates a selection bias problem for the observed positive samples. This scenario represents a conceptual framework for many practical applications, such as recommender systems, which we refer to as ``learning from positive, unlabeled, and exposure data'' (PUE classification). To tackle this problem, we initially assume access to data with exposure labels. Then, we propose a method to identify the function of interest using a strong ignorability assumption and develop an ``Automatic Debiased PUE'' (ADPUE) learning method. This algorithm directly debiases the selection bias without requiring intermediate estimates, such as the propensity score, which is necessary for other learning methods. Through experiments, we demonstrate that our approach outperforms traditional PU learning methods on various semi-synthetic datasets.

Via

Access Paper or Ask Questions

Learning Causal Relationships from Conditional Moment Conditions by Importance Weighting

Aug 03, 2021

Masahiro Kato, Haruo Kakehi, Kenichiro McAlinn, Shota Yasui

Figure 1 for Learning Causal Relationships from Conditional Moment Conditions by Importance Weighting

Figure 2 for Learning Causal Relationships from Conditional Moment Conditions by Importance Weighting

Figure 3 for Learning Causal Relationships from Conditional Moment Conditions by Importance Weighting

Figure 4 for Learning Causal Relationships from Conditional Moment Conditions by Importance Weighting

Abstract:We consider learning causal relationships under conditional moment conditions. Unlike causal inference under unconditional moment conditions, conditional moment conditions pose serious challenges for causal inference, especially in complex, high-dimensional settings. To address this issue, we propose a method that transforms conditional moment conditions to unconditional moment conditions through importance weighting using the conditional density ratio. Then, using this transformation, we propose a method that successfully approximates conditional moment conditions. Our proposed approach allows us to employ methods for estimating causal parameters from unconditional moment conditions, such as generalized method of moments, adequately in a straightforward manner. In experiments, we confirm that our proposed method performs well compared to existing methods.

Via

Access Paper or Ask Questions

A Practical Guide of Off-Policy Evaluation for Bandit Problems

Oct 23, 2020

Masahiro Kato, Kenshi Abe, Kaito Ariu, Shota Yasui

Figure 1 for A Practical Guide of Off-Policy Evaluation for Bandit Problems

Figure 2 for A Practical Guide of Off-Policy Evaluation for Bandit Problems

Figure 3 for A Practical Guide of Off-Policy Evaluation for Bandit Problems

Figure 4 for A Practical Guide of Off-Policy Evaluation for Bandit Problems

Abstract:Off-policy evaluation (OPE) is the problem of estimating the value of a target policy from samples obtained via different policies. Recently, applying OPE methods for bandit problems has garnered attention. For the theoretical guarantees of an estimator of the policy value, the OPE methods require various conditions on the target policy and policy used for generating the samples. However, existing studies did not carefully discuss the practical situation where such conditions hold, and the gap between them remains. This paper aims to show new results for bridging the gap. Based on the properties of the evaluation policy, we categorize OPE situations. Then, among practical applications, we mainly discuss the best policy selection. For the situation, we propose a meta-algorithm based on existing OPE estimators. We investigate the proposed concepts using synthetic and open real-world datasets in experiments.

Via

Access Paper or Ask Questions

Learning Classifiers under Delayed Feedback with a Time Window Assumption

Sep 28, 2020

Masahiro Kato, Shota Yasui

Figure 1 for Learning Classifiers under Delayed Feedback with a Time Window Assumption

Figure 2 for Learning Classifiers under Delayed Feedback with a Time Window Assumption

Figure 3 for Learning Classifiers under Delayed Feedback with a Time Window Assumption

Abstract:We consider training a binary classifier under delayed feedback (DF Learning). In DF Learning, we first receive negative samples; subsequently, some samples turn positive. This problem is conceivable in various real-world applications such as online advertisements, where the user action takes place long after the first click. Owing to the delayed feedback, simply separating the positive and negative data causes a sample selection bias. One solution is to assume that a long time window after first observing a sample reduces the sample selection bias. However, existing studies report that only using a portion of all samples based on the time window assumption yields suboptimal performance, and the use of all samples along with the time window assumption improves empirical performance. Extending these existing studies, we propose a method with an unbiased and convex empirical risk constructed from the whole samples under the time window assumption. We provide experimental results to demonstrate the effectiveness of the proposed method using a real traffic log dataset.

Via

Access Paper or Ask Questions

Off-Policy Evaluation and Learning for External Validity under a Covariate Shift

Feb 26, 2020

Masahiro Kato, Masatoshi Uehara, Shota Yasui

Figure 1 for Off-Policy Evaluation and Learning for External Validity under a Covariate Shift

Figure 2 for Off-Policy Evaluation and Learning for External Validity under a Covariate Shift

Figure 3 for Off-Policy Evaluation and Learning for External Validity under a Covariate Shift

Figure 4 for Off-Policy Evaluation and Learning for External Validity under a Covariate Shift

Abstract:We consider the evaluation and training of a new policy for the evaluation data by using the historical data obtained from a different policy. The goal of off-policy evaluation (OPE) is to estimate the expected reward of a new policy over the evaluation data, and that of off-policy learning (OPL) is to find a new policy that maximizes the expected reward over the evaluation data. Although the standard OPE and OPL assume the same distribution of covariate between the historical and evaluation data, there often exists a problem of a covariate shift, i.e., the distribution of the covariate of the historical data is different from that of the evaluation data. In this paper, we derive the efficiency bound of OPE under a covariate shift. Then, we propose doubly robust and efficient estimators for OPE and OPL under a covariate shift by using an estimator of the density ratio between the distributions of the historical and evaluation data. We also discuss other possible estimators and compare their theoretical properties. Finally, we confirm the effectiveness of the proposed estimators through experiments.

Via

Access Paper or Ask Questions

Safe Counterfactual Reinforcement Learning

Feb 20, 2020

Yusuke Narita, Shota Yasui, Kohei Yata

Figure 1 for Safe Counterfactual Reinforcement Learning

Figure 2 for Safe Counterfactual Reinforcement Learning

Figure 3 for Safe Counterfactual Reinforcement Learning

Figure 4 for Safe Counterfactual Reinforcement Learning

Abstract:We develop a method for predicting the performance of reinforcement learning and bandit algorithms, given historical data that may have been generated by a different algorithm. Our estimator has the property that its prediction converges in probability to the true performance of a counterfactual algorithm at the fast $\sqrt{N}$ rate, as the sample size $N$ increases. We also show a correct way to estimate the variance of our prediction, thus allowing the analyst to quantify the uncertainty in the prediction. These properties hold even when the analyst does not know which among a large number of potentially important state variables are really important. These theoretical guarantees make our estimator safe to use. We finally apply it to improve advertisement design by a major advertisement company. We find that our method produces smaller mean squared errors than state-of-the-art methods.

Via

Access Paper or Ask Questions

A Feedback Shift Correction in Predicting Conversion Rates under Delayed Feedback

Feb 06, 2020

Shota Yasui, Gota Morishita, Komei Fujita, Masashi Shibata

Figure 1 for A Feedback Shift Correction in Predicting Conversion Rates under Delayed Feedback

Figure 2 for A Feedback Shift Correction in Predicting Conversion Rates under Delayed Feedback

Figure 3 for A Feedback Shift Correction in Predicting Conversion Rates under Delayed Feedback

Figure 4 for A Feedback Shift Correction in Predicting Conversion Rates under Delayed Feedback

Abstract:In display advertising, predicting the conversion rate, that is, the probability that a user takes a predefined action on an advertiser's website, such as purchasing goods is fundamental in estimating the value of displaying the advertisement. However, there is a relatively long time delay between a click and its resultant conversion. Because of the delayed feedback, some positive instances at the training period are labeled as negative because some conversions have not yet occurred when training data are gathered. As a result, the conditional label distributions differ between the training data and the production environment. This situation is referred to as a feedback shift. We address this problem by using an importance weight approach typically used for covariate shift correction. We prove its consistency for the feedback shift. Results in both offline and online experiments show that our proposed method outperforms the existing method.

* The Web Conference 2020 (WWW '20)

Via

Access Paper or Ask Questions

Dual Learning Algorithm for Delayed Feedback in Display Advertising

Oct 04, 2019

Yuta Saito, Gota Morishita, Shota Yasui

Figure 1 for Dual Learning Algorithm for Delayed Feedback in Display Advertising

Abstract:In display advertising, predicting the conversion rate, that is, the probability that a user takes a predefined action on an advertiser's website is fundamental in estimating the value of showing a user an advertisement. There are two troublesome difficulties in the conversion rate prediction due to the delayed feedback. First, some positive labels are not correctly observed in training data, because some conversions do not occur right after clicking the ads. Moreover, the delay mechanism is not uniform among instances; some positive feedback is much more frequently observed than the others. It is widely acknowledged that these problems cause a severe bias in the naive empirical average loss function for the conversion rate prediction. To overcome the challenges, we propose two unbiased estimators, one for the conversion rate prediction, and the other for the bias estimation. Subsequently, we propose an interactive learning algorithm named {\em Dual Learning Algorithm for Delayed Feedback (DLA-DF)} where a conversion rate predictor and a bias estimator are learned alternately. The proposed algorithm is the first of its kind to address the two major challenges in a theoretically principal way. Lastly, we conducted a simulation experiment to demonstrate that the proposed method outperforms the existing baselines and validate that the unbiased estimation approach is suitable for the delayed feedback problem.

* accepted to the CausalML Workshop at NeurIPS 2019

Via

Access Paper or Ask Questions