Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ross E. Allen

Distributed Online Planning for Min-Max Problems in Networked Markov Games

May 29, 2024

Alexandros E. Tzikas, Jinkyoo Park, Mykel J. Kochenderfer, Ross E. Allen

Figure 1 for Distributed Online Planning for Min-Max Problems in Networked Markov Games

Figure 2 for Distributed Online Planning for Min-Max Problems in Networked Markov Games

Figure 3 for Distributed Online Planning for Min-Max Problems in Networked Markov Games

Abstract:Min-max problems are important in multi-agent sequential decision-making because they improve the performance of the worst-performing agent in the network. However, solving the multi-agent min-max problem is challenging. We propose a modular, distributed, online planning-based algorithm that is able to approximate the solution of the min-max objective in networked Markov games, assuming that the agents communicate within a network topology and the transition and reward functions are neighborhood-dependent. This set-up is encountered in the multi-robot setting. Our method consists of two phases at every planning step. In the first phase, each agent obtains sample returns based on its local reward function, by performing online planning. Using the samples from online planning, each agent constructs a concave approximation of its underlying local return as a function of only the action of its neighborhood at the next planning step. In the second phase, the agents deploy a distributed optimization framework that converges to the optimal immediate next action for each agent, based on the function approximations of the first phase. We demonstrate our algorithm's performance through formation control simulations.

* Accepted to appear in the IEEE Robotics and Automation Letters

Via

Access Paper or Ask Questions

Any-Play: An Intrinsic Augmentation for Zero-Shot Coordination

Jan 28, 2022

Keane Lucas, Ross E. Allen

Figure 1 for Any-Play: An Intrinsic Augmentation for Zero-Shot Coordination

Figure 2 for Any-Play: An Intrinsic Augmentation for Zero-Shot Coordination

Figure 3 for Any-Play: An Intrinsic Augmentation for Zero-Shot Coordination

Figure 4 for Any-Play: An Intrinsic Augmentation for Zero-Shot Coordination

Abstract:Cooperative artificial intelligence with human or superhuman proficiency in collaborative tasks stands at the frontier of machine learning research. Prior work has tended to evaluate cooperative AI performance under the restrictive paradigms of self-play (teams composed of agents trained together) and cross-play (teams of agents trained independently but using the same algorithm). Recent work has indicated that AI optimized for these narrow settings may make for undesirable collaborators in the real-world. We formalize an alternative criteria for evaluating cooperative AI, referred to as inter-algorithm cross-play, where agents are evaluated on teaming performance with all other agents within an experiment pool with no assumption of algorithmic similarities between agents. We show that existing state-of-the-art cooperative AI algorithms, such as Other-Play and Off-Belief Learning, under-perform in this paradigm. We propose the Any-Play learning augmentation -- a multi-agent extension of diversity-based intrinsic rewards for zero-shot coordination (ZSC) -- for generalizing self-play-based algorithms to the inter-algorithm cross-play setting. We apply the Any-Play learning augmentation to the Simplified Action Decoder (SAD) and demonstrate state-of-the-art performance in the collaborative card game Hanabi.

* Accepted to AAMAS 2022. Code will be made available at https://github.com/mit-ll/hanabi_AnyPlay (may take several weeks after posting of this pre-print)

Via

Access Paper or Ask Questions

Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi

Jul 20, 2021

Ho Chit Siu, Jaime D. Pena, Kimberlee C. Chang, Edenna Chen, Yutai Zhou, Victor J. Lopez, Kyle Palko, Ross E. Allen

Figure 1 for Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi

Figure 2 for Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi

Figure 3 for Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi

Figure 4 for Evaluation of Human-AI Teams for Learned and Rule-Based Agents in Hanabi

Abstract:Deep reinforcement learning has generated superhuman AI in competitive games such as Go and StarCraft. Can similar learning techniques create a superior AI teammate for human-machine collaborative games? Will humans prefer AI teammates that improve objective team performance or those that improve subjective metrics of trust? In this study, we perform a single-blind evaluation of teams of humans and AI agents in the cooperative card game Hanabi, with both rule-based and learning-based agents. In addition to the game score, used as an objective metric of the human-AI team performance, we also quantify subjective measures of the human's perceived performance, teamwork, interpretability, trust, and overall preference of AI teammate. We find that humans have a clear preference toward a rule-based AI teammate (SmartBot) over a state-of-the-art learning-based AI teammate (Other-Play) across nearly all subjective metrics, and generally view the learning-based agent negatively, despite no statistical difference in the game score. This result has implications for future AI design and reinforcement learning benchmarking, highlighting the need to incorporate subjective metrics of human-AI teaming rather than a singular focus on objective task performance.

Via

Access Paper or Ask Questions

Health-Informed Policy Gradients for Multi-Agent Reinforcement Learning

Aug 02, 2019

Ross E. Allen, Javona White Bear, Jayesh K. Gupta, Mykel J. Kochenderfer

Figure 1 for Health-Informed Policy Gradients for Multi-Agent Reinforcement Learning

Figure 2 for Health-Informed Policy Gradients for Multi-Agent Reinforcement Learning

Figure 3 for Health-Informed Policy Gradients for Multi-Agent Reinforcement Learning

Figure 4 for Health-Informed Policy Gradients for Multi-Agent Reinforcement Learning

Abstract:This paper proposes a definition of system health in the context of multiple agents optimizing a joint reward function. We use this definition as a credit assignment term in a policy gradient algorithm to distinguish the contributions of individual agents to the global reward. The health-informed credit assignment is then extended to a multi-agent variant of the proximal policy optimization algorithm and demonstrated on simple particle environments that have elements of system health, risk-taking, semi-expendable agents, and partial observability. We show significant improvement in learning performance compared to policy gradient methods that do not perform multi-agent credit assignment.

Via

Access Paper or Ask Questions