Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Importance Sampling Policy Evaluation with an Estimated Behavior Policy

Sep 24, 2018

Josiah Hanna, Scott Niekum, Peter Stone

Figure 1 for Importance Sampling Policy Evaluation with an Estimated Behavior Policy

Figure 2 for Importance Sampling Policy Evaluation with an Estimated Behavior Policy

Figure 3 for Importance Sampling Policy Evaluation with an Estimated Behavior Policy

Share this with someone who'll enjoy it:

Abstract:We consider the problem of off-policy evaluation in Markov decision processes. Off-policy evaluation is the task of evaluating the expected return of one policy with data generated by a different, behavior policy. Importance sampling is a technique for off-policy evaluation that re-weights off-policy returns to account for differences in the likelihood of the returns between the two policies. In this paper, we study importance sampling with an estimated behavior policy where the behavior policy estimate comes from the same set of data used to compute the importance sampling estimate. We find that this estimator often lowers the mean squared error of off-policy evaluation compared to importance sampling with the true behavior policy or using a behavior policy that is estimated from a separate data set. Our empirical results also extend to other popular variants of importance sampling and show that estimating a non-Markovian behavior policy can further lower mean squared error even when the true behavior policy is Markovian.

View paper on

Share this with someone who'll enjoy it:

Title:Importance Sampling Policy Evaluation with an Estimated Behavior Policy

Paper and Code