Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning

Oct 11, 2022

Anton Bakhtin, David J Wu, Adam Lerer, Jonathan Gray, Athul Paul Jacob, Gabriele Farina, Alexander H Miller, Noam Brown

Figure 1 for Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning

Figure 2 for Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning

Figure 3 for Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning

Figure 4 for Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning

Share this with someone who'll enjoy it:

Abstract:No-press Diplomacy is a complex strategy game involving both cooperation and competition that has served as a benchmark for multi-agent AI research. While self-play reinforcement learning has resulted in numerous successes in purely adversarial games like chess, Go, and poker, self-play alone is insufficient for achieving optimal performance in domains involving cooperation with humans. We address this shortcoming by first introducing a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy. We prove that this is a no-regret learning algorithm under a modified utility function. We then show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL that provides a model of human play while simultaneously training an agent that responds well to this human model. We used RL-DiL-piKL to train an agent we name Diplodocus. In a 200-game no-press Diplomacy tournament involving 62 human participants spanning skill levels from beginner to expert, two Diplodocus agents both achieved a higher average score than all other participants who played more than two games, and ranked first and third according to an Elo ratings model.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Mastering the Game of No-Press Diplomacy via Human-Regularized Reinforcement Learning and Planning

Paper and Code