Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Multi-armed Bandits with Cost Subsidy

Nov 13, 2020

Deeksha Sinha, Karthik Abinav Sankararama, Abbas Kazerouni, Vashist Avadhanula

Figure 1 for Multi-armed Bandits with Cost Subsidy

Figure 2 for Multi-armed Bandits with Cost Subsidy

Figure 3 for Multi-armed Bandits with Cost Subsidy

Share this with someone who'll enjoy it:

Abstract:In this paper, we consider a novel variant of the multi-armed bandit (MAB) problem, MAB with cost subsidy, which models many real-life applications where the learning agent has to pay to select an arm and is concerned about optimizing cumulative costs and rewards. We present two applications, intelligent SMS routing problem and ad audience optimization problem faced by a number of businesses (especially online platforms) and show how our problem uniquely captures key features of these applications. We show that naive generalizations of existing MAB algorithms like Upper Confidence Bound and Thompson Sampling do not perform well for this problem. We then establish fundamental lower bound of $\Omega(K^{1/3} T^{2/3})$ on the performance of any online learning algorithm for this problem, highlighting the hardness of our problem in comparison to the classical MAB problem (where $T$ is the time horizon and $K$ is the number of arms). We also present a simple variant of explore-then-commit and establish near-optimal regret bounds for this algorithm. Lastly, we perform extensive numerical simulations to understand the behavior of a suite of algorithms for various instances and recommend a practical guide to employ different algorithms.

View paper on

Share this with someone who'll enjoy it:

Title:Multi-armed Bandits with Cost Subsidy

Paper and Code