Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Mostly Exploration-Free Algorithms for Contextual Bandits

Oct 02, 2018

Hamsa Bastani, Mohsen Bayati, Khashayar Khosravi

Figure 1 for Mostly Exploration-Free Algorithms for Contextual Bandits

Figure 2 for Mostly Exploration-Free Algorithms for Contextual Bandits

Figure 3 for Mostly Exploration-Free Algorithms for Contextual Bandits

Figure 4 for Mostly Exploration-Free Algorithms for Contextual Bandits

Share this with someone who'll enjoy it:

Abstract:The contextual bandit literature has traditionally focused on algorithms that address the exploration-exploitation tradeoff. In particular, greedy algorithms that exploit current estimates without any exploration may be sub-optimal in general. However, exploration-free greedy algorithms are desirable in practical settings where exploration may be costly or unethical (e.g., clinical trials). Surprisingly, we find that a simple greedy algorithm can be rate-optimal (achieves asymptotically optimal regret) if there is sufficient randomness in the observed contexts (covariates). We prove that this is always the case for a two-armed bandit under a general class of context distributions that satisfy a condition we term $\textit{covariate diversity}$. Furthermore, even absent this condition, we show that a greedy algorithm can be rate optimal with positive probability. Thus, standard bandit algorithms may unnecessarily explore. Motivated by these results, we introduce Greedy-First, a new algorithm that uses only observed contexts and rewards to determine whether to follow a greedy algorithm or to explore. We prove that this algorithm is rate-optimal without any additional assumptions on the context distribution or the number of arms. Extensive simulations demonstrate that Greedy-First successfully reduces exploration and outperforms existing (exploration-based) contextual bandit algorithms such as Thompson sampling or upper confidence bound (UCB).

* 61 Pages, 7 Figures

View paper on

Share this with someone who'll enjoy it:

Title:Mostly Exploration-Free Algorithms for Contextual Bandits

Paper and Code