Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Deep Multi-Agent Reinforcement Learning for Decentralized Continuous Cooperative Control

Mar 18, 2020

Christian Schroeder de Witt, Bei Peng, Pierre-Alexandre Kamienny, Philip Torr, Wendelin Böhmer, Shimon Whiteson

Figure 1 for Deep Multi-Agent Reinforcement Learning for Decentralized Continuous Cooperative Control

Figure 2 for Deep Multi-Agent Reinforcement Learning for Decentralized Continuous Cooperative Control

Figure 3 for Deep Multi-Agent Reinforcement Learning for Decentralized Continuous Cooperative Control

Figure 4 for Deep Multi-Agent Reinforcement Learning for Decentralized Continuous Cooperative Control

Share this with someone who'll enjoy it:

Abstract:Deep multi-agent reinforcement learning (MARL) holds the promise of automating many real-world cooperative robotic manipulation and transportation tasks. Nevertheless, decentralised cooperative robotic control has received less attention from the deep reinforcement learning community, as compared to single-agent robotics and multi-agent games with discrete actions. To address this gap, this paper introduces Multi-Agent Mujoco, an easily extensible multi-agent benchmark suite for robotic control in continuous action spaces. The benchmark tasks are diverse and admit easily configurable partially observable settings. Inspired by the success of single-agent continuous value-based algorithms in robotic control, we also introduce COMIX, a novel extension to a common discrete action multi-agent $Q$-learning algorithm. We show that COMIX significantly outperforms state-of-the-art MADDPG on a partially observable variant of a popular particle environment and matches or surpasses it on Multi-Agent Mujoco. Thanks to this new benchmark suite and method, we can now pose an interesting question: what is the key to performance in such settings, the use of value-based methods instead of policy gradients, or the factorisation of the joint $Q$-function? To answer this question, we propose a second new method, FacMADDPG, which factors MADDPG's critic. Experimental results on Multi-Agent Mujoco suggest that factorisation is the key to performance.

View paper on

Share this with someone who'll enjoy it:

Title:Deep Multi-Agent Reinforcement Learning for Decentralized Continuous Cooperative Control

Paper and Code