Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

András Antos

Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits

Jul 16, 2015

Alexandra Carpentier, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos, Peter Auer, András Antos

Figure 1 for Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits

Abstract:In this paper, we study the problem of estimating uniformly well the mean values of several distributions given a finite budget of samples. If the variance of the distributions were known, one could design an optimal sampling strategy by collecting a number of independent samples per distribution that is proportional to their variance. However, in the more realistic case where the distributions are not known in advance, one needs to design adaptive sampling strategies in order to select which distribution to sample from according to the previously observed samples. We describe two strategies based on pulling the distributions a number of times that is proportional to a high-probability upper-confidence-bound on their variance (built from previous observed samples) and report a finite-sample performance analysis on the excess estimation error compared to the optimal allocation. We show that the performance of these allocation strategies depends not only on the variances but also on the full shape of the distributions.

* 30 pages, 2 Postscript figures, uses elsarticle.cls, earlier, shorter version published in Proceedings of the 22nd International Conference, Algorithmic Learning Theory

Via

Access Paper or Ask Questions

Toward a Classification of Finite Partial-Monitoring Games

Oct 11, 2011

András Antos, Gábor Bartók, Dávid Pál, Csaba Szepesvári

Figure 1 for Toward a Classification of Finite Partial-Monitoring Games

Figure 2 for Toward a Classification of Finite Partial-Monitoring Games

Figure 3 for Toward a Classification of Finite Partial-Monitoring Games

Figure 4 for Toward a Classification of Finite Partial-Monitoring Games

Abstract:Partial-monitoring games constitute a mathematical framework for sequential decision making problems with imperfect feedback: The learner repeatedly chooses an action, opponent responds with an outcome, and then the learner suffers a loss and receives a feedback signal, both of which are fixed functions of the action and the outcome. The goal of the learner is to minimize his total cumulative loss. We make progress towards the classification of these games based on their minimax expected regret. Namely, we classify almost all games with two outcomes and finite number of actions: We show that their minimax expected regret is either zero, $\widetilde{\Theta}(\sqrt{T})$, $\Theta(T^{2/3})$, or $\Theta(T)$ and we give a simple and efficiently computable classification of these four classes of games. Our hope is that the result can serve as a stepping stone toward classifying all finite partial-monitoring games.

* Submitted for review to Theoretical Computer Science (Special Issue of the conference Algorithmic Learning Theory 2010)

Via

Access Paper or Ask Questions

Non-trivial two-armed partial-monitoring games are bandits

Aug 24, 2011

András Antos, Gábor Bartók, Csaba Szepesvári

Abstract:We consider online learning in partial-monitoring games against an oblivious adversary. We show that when the number of actions available to the learner is two and the game is nontrivial then it is reducible to a bandit-like game and thus the minimax regret is $\Theta(\sqrt{T})$.

Via

Access Paper or Ask Questions