Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Centralized Model and Exploration Policy for Multi-Agent RL

Jul 14, 2021

Qizhen Zhang, Chris Lu, Animesh Garg, Jakob Foerster

Figure 1 for Centralized Model and Exploration Policy for Multi-Agent RL

Figure 2 for Centralized Model and Exploration Policy for Multi-Agent RL

Figure 3 for Centralized Model and Exploration Policy for Multi-Agent RL

Figure 4 for Centralized Model and Exploration Policy for Multi-Agent RL

Share this with someone who'll enjoy it:

Abstract:Reinforcement learning (RL) in partially observable, fully cooperative multi-agent settings (Dec-POMDPs) can in principle be used to address many real-world challenges such as controlling a swarm of rescue robots or a synchronous team of quadcopters. However, Dec-POMDPs are significantly harder to solve than single-agent problems, with the former being NEXP-complete and the latter, MDPs, being just P-complete. Hence, current RL algorithms for Dec-POMDPs suffer from poor sample complexity, thereby reducing their applicability to practical problems where environment interaction is costly. Our key insight is that using just a polynomial number of samples, one can learn a centralized model that generalizes across different policies. We can then optimize the policy within the learned model instead of the true system, reducing the number of environment interactions. We also learn a centralized exploration policy within our model that learns to collect additional data in state-action regions with high model uncertainty. Finally, we empirically evaluate the proposed model-based algorithm, MARCO, in three cooperative communication tasks, where it improves sample efficiency by up to 20x.

View paper on

Share this with someone who'll enjoy it:

Title:Centralized Model and Exploration Policy for Multi-Agent RL

Paper and Code