Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shatong Zhu

POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning

May 15, 2024

Chang Huang, Junqiao Zhao, Shatong Zhu, Hongtu Zhou, Chen Ye, Tiantian Feng, Changjun Jiang

Figure 1 for POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning

Figure 2 for POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning

Figure 3 for POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning

Figure 4 for POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning

Abstract:Value function factorization methods are commonly used in cooperative multi-agent reinforcement learning, with QMIX receiving significant attention. Many QMIX-based methods introduce monotonicity constraints between the joint action value and individual action values to achieve decentralized execution. However, such constraints limit the representation capacity of value factorization, restricting the joint action values it can represent and hindering the learning of the optimal policy. To address this challenge, we propose the Potentially Optimal joint actions Weighted QMIX (POWQMIX) algorithm, which recognizes the potentially optimal joint actions and assigns higher weights to the corresponding losses of these joint actions during training. We theoretically prove that with such a weighted training approach the optimal policy is guaranteed to be recovered. Experiments in matrix games, predator-prey, and StarCraft II Multi-Agent Challenge environments demonstrate that our algorithm outperforms the state-of-the-art value-based multi-agent reinforcement learning methods.

* change reference format

Via

Access Paper or Ask Questions

Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning

Feb 04, 2024

Lanqing Li, Hai Zhang, Xinyu Zhang, Shatong Zhu, Junqiao Zhao, Pheng-Ann Heng

Figure 1 for Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning

Figure 2 for Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning

Figure 3 for Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning

Figure 4 for Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning

Abstract:As a marriage between offline RL and meta-RL, the advent of offline meta-reinforcement learning (OMRL) has shown great promise in enabling RL agents to multi-task and quickly adapt while acquiring knowledge safely. Among which, Context-based OMRL (COMRL) as a popular paradigm, aims to learn a universal policy conditioned on effective task representations. In this work, by examining several key milestones in the field of COMRL, we propose to integrate these seemingly independent methodologies into a unified information theoretic framework. Most importantly, we show that the pre-existing COMRL algorithms are essentially optimizing the same mutual information objective between the task variable $\boldsymbol{M}$ and its latent representation $\boldsymbol{Z}$ by implementing various approximate bounds. Based on the theoretical insight and the information bottleneck principle, we arrive at a novel algorithm dubbed UNICORN, which exhibits remarkable generalization across a broad spectrum of RL benchmarks, context shift scenarios, data qualities and deep learning architectures, attaining the new state-of-the-art. We believe that our framework could open up avenues for new optimality bounds and COMRL algorithms.

* 20 pages, 8 figures, 5 tables. TLDR: We propose a novel information theoretic framework of the context-based offline meta-RL paradigm, which unifies several mainstream methods and leads to a general and state-of-the-art algorithm called UNICORN

Via

Access Paper or Ask Questions