Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:A Doubly Optimistic Strategy for Safe Linear Bandits

Sep 27, 2022

Tianrui Chen, Aditya Gangrade, Venkatesh Saligrama

Figure 1 for A Doubly Optimistic Strategy for Safe Linear Bandits

Figure 2 for A Doubly Optimistic Strategy for Safe Linear Bandits

Figure 3 for A Doubly Optimistic Strategy for Safe Linear Bandits

Figure 4 for A Doubly Optimistic Strategy for Safe Linear Bandits

Share this with someone who'll enjoy it:

Abstract:We propose a \underline{d}oubly \underline{o}ptimistic strategy for the \underline{s}afe-\underline{l}inear-\underline{b}andit problem, DOSLB. The safe linear bandit problem is to optimise an unknown linear reward whilst satisfying unknown round-wise safety constraints on actions, using stochastic bandit feedback of reward and safety-risks of actions. In contrast to prior work on aggregated resource constraints, our formulation explicitly demands control on roundwise safety risks. Unlike existing optimistic-pessimistic paradigms for safe bandits, DOSLB exercises supreme optimism, using optimistic estimates of reward and safety scores to select actions. Yet, and surprisingly, we show that DOSLB rarely takes risky actions, and obtains $\tilde{O}(d \sqrt{T})$ regret, where our notion of regret accounts for both inefficiency and lack of safety of actions. Specialising to polytopal domains, we first notably show that the $\sqrt{T}$-regret bound cannot be improved even with large gaps, and then identify a slackened notion of regret for which we show tight instance-dependent $O(\log^2 T)$ bounds. We further argue that in such domains, the number of times an overly risky action is played is also bounded as $O(\log^2T)$.

View paper on

Share this with someone who'll enjoy it:

Title:A Doubly Optimistic Strategy for Safe Linear Bandits

Paper and Code