Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Julien Zhou

Thoth, STATIFY

Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit

Oct 11, 2024

Julien Zhou, Pierre Gaillard, Thibaud Rahier, Julyan Arbel

Abstract:We address the online unconstrained submodular maximization problem (Online USM), in a setting with stochastic bandit feedback. In this framework, a decision-maker receives noisy rewards from a nonmonotone submodular function, taking values in a known bounded interval. This paper proposes Double-Greedy - Explore-then-Commit (DG-ETC), adapting the Double-Greedy approach from the offline and online full-information settings. DG-ETC satisfies a O(d log(dT)) problemdependent upper bound for the 1/2-approximate pseudo-regret, as well as a O(dT^{2/3}log(dT)^{1/3}) problem-free one at the same time, outperforming existing approaches. To that end, we introduce a notion of hardness for submodular functions, characterizing how difficult it is to maximize them with this type of strategy.

Via

Access Paper or Ask Questions

Covariance-Adaptive Least-Squares Algorithm for Stochastic Combinatorial Semi-Bandits

Feb 23, 2024

Julien Zhou, Pierre Gaillard, Thibaud Rahier, Houssam Zenati, Julyan Arbel

Figure 1 for Covariance-Adaptive Least-Squares Algorithm for Stochastic Combinatorial Semi-Bandits

Figure 2 for Covariance-Adaptive Least-Squares Algorithm for Stochastic Combinatorial Semi-Bandits

Figure 3 for Covariance-Adaptive Least-Squares Algorithm for Stochastic Combinatorial Semi-Bandits

Figure 4 for Covariance-Adaptive Least-Squares Algorithm for Stochastic Combinatorial Semi-Bandits

Abstract:We address the problem of stochastic combinatorial semi-bandits, where a player can select from P subsets of a set containing d base items. Most existing algorithms (e.g. CUCB, ESCB, OLS-UCB) require prior knowledge on the reward distribution, like an upper bound on a sub-Gaussian proxy-variance, which is hard to estimate tightly. In this work, we design a variance-adaptive version of OLS-UCB, relying on an online estimation of the covariance structure. Estimating the coefficients of a covariance matrix is much more manageable in practical settings and results in improved regret upper bounds compared to proxy variance-based algorithms. When covariance coefficients are all non-negative, we show that our approach efficiently leverages the semi-bandit feedback and provably outperforms bandit feedback approaches, not only in exponential regimes where P $\gg$ d but also when P $\le$ d, which is not straightforward from most existing analyses.

Via

Access Paper or Ask Questions