Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ainur Zhaikhan

Multi-agent Off-policy Actor-Critic Reinforcement Learning for Partially Observable Environments

Jul 06, 2024

Ainur Zhaikhan, Ali H. Sayed

Figure 1 for Multi-agent Off-policy Actor-Critic Reinforcement Learning for Partially Observable Environments

Figure 2 for Multi-agent Off-policy Actor-Critic Reinforcement Learning for Partially Observable Environments

Figure 3 for Multi-agent Off-policy Actor-Critic Reinforcement Learning for Partially Observable Environments

Abstract:This study proposes the use of a social learning method to estimate a global state within a multi-agent off-policy actor-critic algorithm for reinforcement learning (RL) operating in a partially observable environment. We assume that the network of agents operates in a fully-decentralized manner, possessing the capability to exchange variables with their immediate neighbors. The proposed design methodology is supported by an analysis demonstrating that the difference between final outcomes, obtained when the global state is fully observed versus estimated through the social learning method, is $\varepsilon$-bounded when an appropriate number of iterations of social learning updates are implemented. Unlike many existing dec-POMDP-based RL approaches, the proposed algorithm is suitable for model-free multi-agent reinforcement learning as it does not require knowledge of a transition model. Furthermore, experimental results illustrate the efficacy of the algorithm and demonstrate its superiority over the current state-of-the-art methods.

Via

Access Paper or Ask Questions

Graph Exploration for Effective Multi-agent Q-Learning

Apr 19, 2023

Ainur Zhaikhan, Ali H. Sayed

Abstract:This paper proposes an exploration technique for multi-agent reinforcement learning (MARL) with graph-based communication among agents. We assume the individual rewards received by the agents are independent of the actions by the other agents, while their policies are coupled. In the proposed framework, neighbouring agents collaborate to estimate the uncertainty about the state-action space in order to execute more efficient explorative behaviour. Different from existing works, the proposed algorithm does not require counting mechanisms and can be applied to continuous-state environments without requiring complex conversion techniques. Moreover, the proposed scheme allows agents to communicate in a fully decentralized manner with minimal information exchange. And for continuous-state scenarios, each agent needs to exchange only a single parameter vector. The performance of the algorithm is verified with theoretical results for discrete-state scenarios and with experiments for continuous ones.

Via

Access Paper or Ask Questions