Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Weighted Linear Bandits for Non-Stationary Environments

Sep 19, 2019

Yoan Russac, Claire Vernade, Olivier Cappé

Figure 1 for Weighted Linear Bandits for Non-Stationary Environments

Figure 2 for Weighted Linear Bandits for Non-Stationary Environments

Share this with someone who'll enjoy it:

Abstract:We consider a stochastic linear bandit model in which the available actions correspond to arbitrary context vectors whose associated rewards follow a non-stationary linear regression model. In this setting, the unknown regression parameter is allowed to vary in time. To address this problem, we propose D-LinUCB, a novel optimistic algorithm based on discounted linear regression, where exponential weights are used to smoothly forget the past. This involves studying the deviations of the sequential weighted least-squares estimator under generic assumptions. As a by-product, we obtain novel deviation results that can be used beyond non-stationary environments. We provide theoretical guarantees on the behavior of D-LinUCB in both slowly-varying and abruptly-changing environments. We obtain an upper bound on the dynamic regret that is of order $d^{2/3} B_T^{1/3}T^{2/3}$, where $B_T$ is a measure of non-stationarity (d and T being, respectively, dimension and horizon). This rate is known to be optimal. We also illustrate the empirical performance of D-LinUCB and compare it with recently proposed alternatives in simulated environments.

* Neural Information Processing Systems (NeurIPS), Dec 2019, Vancouver, Canada

View paper on

Share this with someone who'll enjoy it:

Title:Weighted Linear Bandits for Non-Stationary Environments

Paper and Code