Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Terry Liu

QPLEX: Duplex Dueling Multi-Agent Q-Learning

Aug 03, 2020

Jianhao Wang, Zhizhou Ren, Terry Liu, Yang Yu, Chongjie Zhang

Figure 1 for QPLEX: Duplex Dueling Multi-Agent Q-Learning

Figure 2 for QPLEX: Duplex Dueling Multi-Agent Q-Learning

Figure 3 for QPLEX: Duplex Dueling Multi-Agent Q-Learning

Figure 4 for QPLEX: Duplex Dueling Multi-Agent Q-Learning

Abstract:We explore value-based multi-agent reinforcement learning (MARL) in the popular paradigm of centralized training with decentralized execution (CTDE). CTDE requires the consistency of the optimal joint action selection with optimal individual action selections, which is called the IGM (Individual-Global-Max) principle. However, in order to achieve scalability, existing MARL methods either limit representation expressiveness of their value function classes or relax the IGM consistency, which may lead to poor policies or even divergence. This paper presents a novel MARL approach, called duPLEX dueling multi-agent Q-learning (QPLEX), that takes a duplex dueling network architecture to factorize the joint value function. This duplex dueling architecture transforms the IGM principle to easily realized constraints on advantage functions and thus enables efficient value function learning. Theoretical analysis shows that QPLEX solves a rich class of tasks. Empirical experiments on StarCraft II unit micromanagement tasks demonstrate that QPLEX significantly outperforms state-of-the-art baselines in both online and offline task settings, and also reveal that QPLEX achieves high sample efficiency and can benefit from offline datasets without additional exploration.

Via

Access Paper or Ask Questions