Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jonathan Shock

Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and Evaluation

Jun 13, 2024

Claude Formanek, Callum Rhys Tilbury, Louise Beyers, Jonathan Shock, Arnu Pretorius

Figure 1 for Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and Evaluation

Figure 2 for Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and Evaluation

Figure 3 for Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and Evaluation

Figure 4 for Dispelling the Mirage of Progress in Offline MARL through Standardised Baselines and Evaluation

Abstract:Offline multi-agent reinforcement learning (MARL) is an emerging field with great promise for real-world applications. Unfortunately, the current state of research in offline MARL is plagued by inconsistencies in baselines and evaluation protocols, which ultimately makes it difficult to accurately assess progress, trust newly proposed innovations, and allow researchers to easily build upon prior work. In this paper, we firstly identify significant shortcomings in existing methodologies for measuring the performance of novel algorithms through a representative study of published offline MARL work. Secondly, by directly comparing to this prior work, we demonstrate that simple, well-implemented baselines can achieve state-of-the-art (SOTA) results across a wide range of tasks. Specifically, we show that on 35 out of 47 datasets used in prior work (almost 75% of cases), we match or surpass the performance of the current purported SOTA. Strikingly, our baselines often substantially outperform these more sophisticated algorithms. Finally, we correct for the shortcomings highlighted from this prior work by introducing a straightforward standardised methodology for evaluation and by providing our baseline implementations with statistically robust results across several scenarios, useful for comparisons in future work. Our proposal includes simple and sensible steps that are easy to adopt, which in combination with solid baselines and comparative results, could substantially improve the overall rigour of empirical science in offline MARL moving forward.

Via

Access Paper or Ask Questions

Reduce, Reuse, Recycle: Selective Reincarnation in Multi-Agent Reinforcement Learning

Mar 31, 2023

Claude Formanek, Callum Rhys Tilbury, Jonathan Shock, Kale-ab Tessera, Arnu Pretorius

Abstract:'Reincarnation' in reinforcement learning has been proposed as a formalisation of reusing prior computation from past experiments when training an agent in an environment. In this paper, we present a brief foray into the paradigm of reincarnation in the multi-agent (MA) context. We consider the case where only some agents are reincarnated, whereas the others are trained from scratch -- selective reincarnation. In the fully-cooperative MA setting with heterogeneous agents, we demonstrate that selective reincarnation can lead to higher returns than training fully from scratch, and faster convergence than training with full reincarnation. However, the choice of which agents to reincarnate in a heterogeneous system is vitally important to the outcome of the training -- in fact, a poor choice can lead to considerably worse results than the alternatives. We argue that a rich field of work exists here, and we hope that our effort catalyses further energy in bringing the topic of reincarnation to the multi-agent realm.

* Accepted as oral presentation at Reincarnating Reinforcement Learning workshop at ICLR 2023

Via

Access Paper or Ask Questions

Off-the-Grid MARL: a Framework for Dataset Generation with Baselines for Cooperative Offline Multi-Agent Reinforcement Learning

Feb 01, 2023

Claude Formanek, Asad Jeewa, Jonathan Shock, Arnu Pretorius

Abstract:Being able to harness the power of large, static datasets for developing autonomous multi-agent systems could unlock enormous value for real-world applications. Many important industrial systems are multi-agent in nature and are difficult to model using bespoke simulators. However, in industry, distributed system processes can often be recorded during operation, and large quantities of demonstrative data can be stored. Offline multi-agent reinforcement learning (MARL) provides a promising paradigm for building effective online controllers from static datasets. However, offline MARL is still in its infancy, and, therefore, lacks standardised benchmarks, baselines and evaluation protocols typically found in more mature subfields of RL. This deficiency makes it difficult for the community to sensibly measure progress. In this work, we aim to fill this gap by releasing \emph{off-the-grid MARL (OG-MARL)}: a framework for generating offline MARL datasets and algorithms. We release an initial set of datasets and baselines for cooperative offline MARL, created using the framework, along with a standardised evaluation protocol. Our datasets provide settings that are characteristic of real-world systems, including complex dynamics, non-stationarity, partial observability, suboptimality and sparse rewards, and are generated from popular online MARL benchmarks. We hope that OG-MARL will serve the community and help steer progress in offline MARL, while also providing an easy entry point for researchers new to the field.

Via

Access Paper or Ask Questions

Causal Multi-Agent Reinforcement Learning: Review and Open Problems

Dec 01, 2021

St John Grimbly, Jonathan Shock, Arnu Pretorius

Figure 1 for Causal Multi-Agent Reinforcement Learning: Review and Open Problems

Figure 2 for Causal Multi-Agent Reinforcement Learning: Review and Open Problems

Figure 3 for Causal Multi-Agent Reinforcement Learning: Review and Open Problems

Abstract:This paper serves to introduce the reader to the field of multi-agent reinforcement learning (MARL) and its intersection with methods from the study of causality. We highlight key challenges in MARL and discuss these in the context of how causal methods may assist in tackling them. We promote moving toward a 'causality first' perspective on MARL. Specifically, we argue that causality can offer improved safety, interpretability, and robustness, while also providing strong theoretical guarantees for emergent behaviour. We discuss potential solutions for common challenges, and use this context to motivate future research directions.

* Accepted at Cooperative AI Workshop, NeurIPS 2021

Via

Access Paper or Ask Questions

Mava: a research framework for distributed multi-agent reinforcement learning

Jul 03, 2021

Arnu Pretorius, Kale-ab Tessera, Andries P. Smit, Claude Formanek, St John Grimbly, Kevin Eloff, Siphelele Danisa, Lawrence Francis, Jonathan Shock, Herman Kamper(+4 more)

Figure 1 for Mava: a research framework for distributed multi-agent reinforcement learning

Figure 2 for Mava: a research framework for distributed multi-agent reinforcement learning

Figure 3 for Mava: a research framework for distributed multi-agent reinforcement learning

Figure 4 for Mava: a research framework for distributed multi-agent reinforcement learning

Abstract:Breakthrough advances in reinforcement learning (RL) research have led to a surge in the development and application of RL. To support the field and its rapid growth, several frameworks have emerged that aim to help the community more easily build effective and scalable agents. However, very few of these frameworks exclusively support multi-agent RL (MARL), an increasingly active field in itself, concerned with decentralised decision-making problems. In this work, we attempt to fill this gap by presenting Mava: a research framework specifically designed for building scalable MARL systems. Mava provides useful components, abstractions, utilities and tools for MARL and allows for simple scaling for multi-process system training and execution, while providing a high level of flexibility and composability. Mava is built on top of DeepMind's Acme \citep{hoffman2020acme}, and therefore integrates with, and greatly benefits from, a wide range of already existing single-agent RL components made available in Acme. Several MARL baseline systems have already been implemented in Mava. These implementations serve as examples showcasing Mava's reusable features, such as interchangeable system architectures, communication and mixing modules. Furthermore, these implementations allow existing MARL algorithms to be easily reproduced and extended. We provide experimental results for these implementations on a wide range of multi-agent environments and highlight the benefits of distributed system training.

* 12 pages, 6 figures

Via

Access Paper or Ask Questions

A game-theoretic analysis of networked system control for common-pool resource management using multi-agent reinforcement learning

Oct 15, 2020

Arnu Pretorius, Scott Cameron, Elan van Biljon, Tom Makkink, Shahil Mawjee, Jeremy du Plessis, Jonathan Shock, Alexandre Laterre, Karim Beguir

Figure 1 for A game-theoretic analysis of networked system control for common-pool resource management using multi-agent reinforcement learning

Figure 2 for A game-theoretic analysis of networked system control for common-pool resource management using multi-agent reinforcement learning

Figure 3 for A game-theoretic analysis of networked system control for common-pool resource management using multi-agent reinforcement learning

Figure 4 for A game-theoretic analysis of networked system control for common-pool resource management using multi-agent reinforcement learning

Abstract:Multi-agent reinforcement learning has recently shown great promise as an approach to networked system control. Arguably, one of the most difficult and important tasks for which large scale networked system control is applicable is common-pool resource management. Crucial common-pool resources include arable land, fresh water, wetlands, wildlife, fish stock, forests and the atmosphere, of which proper management is related to some of society's greatest challenges such as food security, inequality and climate change. Here we take inspiration from a recent research program investigating the game-theoretic incentives of humans in social dilemma situations such as the well-known tragedy of the commons. However, instead of focusing on biologically evolved human-like agents, our concern is rather to better understand the learning and operating behaviour of engineered networked systems comprising general-purpose reinforcement learning agents, subject only to nonbiological constraints such as memory, computation and communication bandwidth. Harnessing tools from empirical game-theoretic analysis, we analyse the differences in resulting solution concepts that stem from employing different information structures in the design of networked multi-agent systems. These information structures pertain to the type of information shared between agents as well as the employed communication protocol and network topology. Our analysis contributes new insights into the consequences associated with certain design choices and provides an additional dimension of comparison between systems beyond efficiency, robustness, scalability and mean control performance.

* 17 pages, 16 Figures, to appear in Advances of Neural Information Processing Systems (NeurIPS) conference, 2020

Via

Access Paper or Ask Questions