Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pranjal Paliwal

Decentralized Multi-Agent Reinforcement Learning with Global State Prediction

Jun 22, 2023

Joshua Bloom, Pranjal Paliwal, Apratim Mukherjee, Carlo Pinciroli

Figure 1 for Decentralized Multi-Agent Reinforcement Learning with Global State Prediction

Figure 2 for Decentralized Multi-Agent Reinforcement Learning with Global State Prediction

Figure 3 for Decentralized Multi-Agent Reinforcement Learning with Global State Prediction

Figure 4 for Decentralized Multi-Agent Reinforcement Learning with Global State Prediction

Abstract:Deep reinforcement learning (DRL) has seen remarkable success in the control of single robots. However, applying DRL to robot swarms presents significant challenges. A critical challenge is non-stationarity, which occurs when two or more robots update individual or shared policies concurrently, thereby engaging in an interdependent training process with no guarantees of convergence. Circumventing non-stationarity typically involves training the robots with global information about other agents' states and/or actions. In contrast, in this paper we explore how to remove the need for global information. We pose our problem as a Partially Observable Markov Decision Process, due to the absence of global knowledge on other agents. Using collective transport as a testbed scenario, we study two approaches to multi-agent training. In the first, the robots exchange no messages, and are trained to rely on implicit communication through push-and-pull on the object to transport. In the second approach, we introduce Global State Prediction (GSP), a network trained to forma a belief over the swarm as a whole and predict its future states. We provide a comprehensive study over four well-known deep reinforcement learning algorithms in environments with obstacles, measuring performance as the successful transport of the object to the goal within a desired time-frame. Through an ablation study, we show that including GSP boosts performance and increases robustness when compared with methods that use global knowledge.

Via

Access Paper or Ask Questions