Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning

Jun 11, 2024

Adhyyan Narang, Andrew Wagenmaker, Lillian Ratliff, Kevin Jamieson

Figure 1 for Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning

Figure 2 for Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning

Figure 3 for Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning

Share this with someone who'll enjoy it:

Abstract:In this paper, we study the non-asymptotic sample complexity for the pure exploration problem in contextual bandits and tabular reinforcement learning (RL): identifying an epsilon-optimal policy from a set of policies with high probability. Existing work in bandits has shown that it is possible to identify the best policy by estimating only the difference between the behaviors of individual policies, which can be substantially cheaper than estimating the behavior of each policy directly. However, the best-known complexities in RL fail to take advantage of this and instead estimate the behavior of each policy directly. Does it suffice to estimate only the differences in the behaviors of policies in RL? We answer this question positively for contextual bandits but in the negative for tabular RL, showing a separation between contextual bandits and RL. However, inspired by this, we show that it almost suffices to estimate only the differences in RL: if we can estimate the behavior of a single reference policy, it suffices to only estimate how any other policy deviates from this reference policy. We develop an algorithm which instantiates this principle and obtains, to the best of our knowledge, the tightest known bound on the sample complexity of tabular RL.

* 59 pages, 2 Figures

View paper on

Share this with someone who'll enjoy it:

Title:Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning

Paper and Code