Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Multi-agent Actor-Critic with Time Dynamical Opponent Model

Apr 12, 2022

Yuan Tian, Klaus-Rudolf Kladny, Qin Wang, Zhiwu Huang, Olga Fink

Figure 1 for Multi-agent Actor-Critic with Time Dynamical Opponent Model

Figure 2 for Multi-agent Actor-Critic with Time Dynamical Opponent Model

Figure 3 for Multi-agent Actor-Critic with Time Dynamical Opponent Model

Figure 4 for Multi-agent Actor-Critic with Time Dynamical Opponent Model

Share this with someone who'll enjoy it:

Abstract:In multi-agent reinforcement learning, multiple agents learn simultaneously while interacting with a common environment and each other. Since the agents adapt their policies during learning, not only the behavior of a single agent becomes non-stationary, but also the environment as perceived by the agent. This renders it particularly challenging to perform policy improvement. In this paper, we propose to exploit the fact that the agents seek to improve their expected cumulative reward and introduce a novel \textit{Time Dynamical Opponent Model} (TDOM) to encode the knowledge that the opponent policies tend to improve over time. We motivate TDOM theoretically by deriving a lower bound of the log objective of an individual agent and further propose \textit{Multi-Agent Actor-Critic with Time Dynamical Opponent Model} (TDOM-AC). We evaluate the proposed TDOM-AC on a differential game and the Multi-agent Particle Environment. We show empirically that TDOM achieves superior opponent behavior prediction during test time. The proposed TDOM-AC methodology outperforms state-of-the-art Actor-Critic methods on the performed experiments in cooperative and \textbf{especially} in mixed cooperative-competitive environments. TDOM-AC results in a more stable training and a faster convergence.

View paper on

Share this with someone who'll enjoy it:

Title:Multi-agent Actor-Critic with Time Dynamical Opponent Model

Paper and Code