Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:STAS: Spatial-Temporal Return Decomposition for Multi-agent Reinforcement Learning

Apr 15, 2023

Sirui Chen, Zhaowei Zhang, Yali Du, Yaodong Yang

Figure 1 for STAS: Spatial-Temporal Return Decomposition for Multi-agent Reinforcement Learning

Figure 2 for STAS: Spatial-Temporal Return Decomposition for Multi-agent Reinforcement Learning

Figure 3 for STAS: Spatial-Temporal Return Decomposition for Multi-agent Reinforcement Learning

Figure 4 for STAS: Spatial-Temporal Return Decomposition for Multi-agent Reinforcement Learning

Share this with someone who'll enjoy it:

Abstract:Centralized Training with Decentralized Execution (CTDE) has been proven to be an effective paradigm in cooperative multi-agent reinforcement learning (MARL). One of the major challenges is yet credit assignment, which aims to credit agents by their contributions. Prior studies focus on either implicitly decomposing the joint value function or explicitly computing the payoff distribution of all agents. However, in episodic reinforcement learning settings where global rewards can only be revealed at the end of the episode, existing methods usually fail to work. They lack the functionality of modeling complicated relations of the delayed global reward in the temporal dimension and suffer from large variance and bias. We propose a novel method named Spatial-Temporal Attention with Shapley (STAS) for return decomposition; STAS learns credit assignment in both the temporal and the spatial dimension. It first decomposes the global return back to each time step, then utilizes Shapley Value to redistribute the individual payoff from the decomposed global reward. To mitigate the computational complexity of Shapley Value, we introduce an approximation of marginal contribution and utilize Monte Carlo sampling to estimate Shapley Value. We evaluate our method on the classical Alice & Bob example and Multi-agent Particle Environments benchmarks across different scenarios, and we show our methods achieve an effective spatial-temporal credit assignment and outperform all state-of-art baselines.

View paper on

Share this with someone who'll enjoy it:

Title:STAS: Spatial-Temporal Return Decomposition for Multi-agent Reinforcement Learning

Paper and Code