Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Othmane Safsafi

Batched Bandits with Crowd Externalities

Sep 29, 2021

Romain Laroche, Othmane Safsafi, Raphael Feraud, Nicolas Broutin

Figure 1 for Batched Bandits with Crowd Externalities

Figure 2 for Batched Bandits with Crowd Externalities

Figure 3 for Batched Bandits with Crowd Externalities

Figure 4 for Batched Bandits with Crowd Externalities

Abstract:In Batched Multi-Armed Bandits (BMAB), the policy is not allowed to be updated at each time step. Usually, the setting asserts a maximum number of allowed policy updates and the algorithm schedules them so that to minimize the expected regret. In this paper, we describe a novel setting for BMAB, with the following twist: the timing of the policy update is not controlled by the BMAB algorithm, but instead the amount of data received during each batch, called \textit{crowd}, is influenced by the past selection of arms. We first design a near-optimal policy with approximate knowledge of the parameters that we prove to have a regret in $\mathcal{O}(\sqrt{\frac{\ln x}{x}}+\epsilon)$ where $x$ is the size of the crowd and $\epsilon$ is the parameter error. Next, we implement a UCB-inspired algorithm that guarantees an additional regret in $\mathcal{O}\left(\max(K\ln T,\sqrt{T\ln T})\right)$, where $K$ is the number of arms and $T$ is the horizon.

* 31 pages

Via

Access Paper or Ask Questions