Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ayalvadi Ganesh

Asymptotic Optimality for Decentralised Bandits

Sep 20, 2021

Conor Newton, Ayalvadi Ganesh, Henry W. J. Reeve

Figure 1 for Asymptotic Optimality for Decentralised Bandits

Figure 2 for Asymptotic Optimality for Decentralised Bandits

Figure 3 for Asymptotic Optimality for Decentralised Bandits

Abstract:We consider a large number of agents collaborating on a multi-armed bandit problem with a large number of arms. The goal is to minimise the regret of each agent in a communication-constrained setting. We present a decentralised algorithm which builds upon and improves the Gossip-Insert-Eliminate method of Chawla et al. arxiv:2001.05452. We provide a theoretical analysis of the regret incurred which shows that our algorithm is asymptotically optimal. In fact, our regret guarantee matches the asymptotically optimal rate achievable in the full communication setting. Finally, we present empirical results which support our conclusions

Via

Access Paper or Ask Questions

The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits

Feb 12, 2020

Ronshee Chawla, Abishek Sankararaman, Ayalvadi Ganesh, Sanjay Shakkottai

Figure 1 for The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits

Figure 2 for The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits

Figure 3 for The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits

Figure 4 for The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits

Abstract:We consider a decentralized multi-agent Multi Armed Bandit (MAB) setup consisting of $N$ agents, solving the same MAB instance to minimize individual cumulative regret. In our model, agents collaborate by exchanging messages through pairwise gossip style communications on an arbitrary connected graph. We develop two novel algorithms, where each agent only plays from a subset of all the arms. Agents use the communication medium to recommend only arm-IDs (not samples), and thus update the set of arms from which they play. We establish that, if agents communicate $\Omega(\log(T))$ times through any connected pairwise gossip mechanism, then every agent's regret is a factor of order $N$ smaller compared to the case of no collaborations. Furthermore, we show that the communication constraints only have a second order effect on the regret of our algorithm. We then analyze this second order term of the regret to derive bounds on the regret-communication tradeoffs. Finally, we empirically evaluate our algorithm and conclude that the insights are fundamental and not artifacts of our bounds. We also show a lower bound which gives that the regret scaling obtained by our algorithm cannot be improved even in the absence of any communication constraints. Our results thus demonstrate that even a minimal level of collaboration among agents greatly reduces regret for all agents.

* To Appear in AISTATS 2020. The first two authors contributed equally

Via

Access Paper or Ask Questions

Social Learning in Multi Agent Multi Armed Bandits

Nov 05, 2019

Abishek Sankararaman, Ayalvadi Ganesh, Sanjay Shakkottai

Figure 1 for Social Learning in Multi Agent Multi Armed Bandits

Figure 2 for Social Learning in Multi Agent Multi Armed Bandits

Figure 3 for Social Learning in Multi Agent Multi Armed Bandits

Figure 4 for Social Learning in Multi Agent Multi Armed Bandits

Abstract:In this paper, we introduce a distributed version of the classical stochastic Multi-Arm Bandit (MAB) problem. Our setting consists of a large number of agents $n$ that collaboratively and simultaneously solve the same instance of $K$ armed MAB to minimize the average cumulative regret over all agents. The agents can communicate and collaborate among each other \emph{only} through a pairwise asynchronous gossip based protocol that exchange a limited number of bits. In our model, agents at each point decide on (i) which arm to play, (ii) whether to, and if so (iii) what and whom to communicate with. Agents in our model are decentralized, namely their actions only depend on their observed history in the past. We develop a novel algorithm in which agents, whenever they choose, communicate only arm-ids and not samples, with another agent chosen uniformly and independently at random. The per-agent regret scaling achieved by our algorithm is $O \left( \frac{\lceil\frac{K}{n}\rceil+\log(n)}{\Delta} \log(T) + \frac{\log^3(n) \log \log(n)}{\Delta^2} \right)$. Furthermore, any agent in our algorithm communicates only a total of $\Theta(\log(T))$ times over a time interval of $T$. We compare our results to two benchmarks - one where there is no communication among agents and one corresponding to complete interaction. We show both theoretically and empirically, that our algorithm experiences a significant reduction both in per-agent regret when compared to the case when agents do not collaborate and in communication complexity when compared to the full interaction setting which requires $T$ communication attempts by an agent over $T$ arm pulls.

* Minor Corrections from before

Via

Access Paper or Ask Questions