Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Modeling Strong and Human-Like Gameplay with KL-Regularized Search

Dec 14, 2021

Athul Paul Jacob, David J. Wu, Gabriele Farina, Adam Lerer, Anton Bakhtin, Jacob Andreas, Noam Brown

Figure 1 for Modeling Strong and Human-Like Gameplay with KL-Regularized Search

Figure 2 for Modeling Strong and Human-Like Gameplay with KL-Regularized Search

Figure 3 for Modeling Strong and Human-Like Gameplay with KL-Regularized Search

Figure 4 for Modeling Strong and Human-Like Gameplay with KL-Regularized Search

Share this with someone who'll enjoy it:

Abstract:We consider the task of building strong but human-like policies in multi-agent decision-making problems, given examples of human behavior. Imitation learning is effective at predicting human actions but may not match the strength of expert humans, while self-play learning and search techniques (e.g. AlphaZero) lead to strong performance but may produce policies that are difficult for humans to understand and coordinate with. We show in chess and Go that regularizing search policies based on the KL divergence from an imitation-learned policy by applying Monte Carlo tree search produces policies that have higher human prediction accuracy and are stronger than the imitation policy. We then introduce a novel regret minimization algorithm that is regularized based on the KL divergence from an imitation-learned policy, and show that applying this algorithm to no-press Diplomacy yields a policy that maintains the same human prediction accuracy as imitation learning while being substantially stronger.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Modeling Strong and Human-Like Gameplay with KL-Regularized Search

Paper and Code