Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Accelerating Training in Pommerman with Imitation and Reinforcement Learning

Nov 13, 2019

Hardik Meisheri, Omkar Shelke, Richa Verma, Harshad Khadilkar

Figure 1 for Accelerating Training in Pommerman with Imitation and Reinforcement Learning

Figure 2 for Accelerating Training in Pommerman with Imitation and Reinforcement Learning

Figure 3 for Accelerating Training in Pommerman with Imitation and Reinforcement Learning

Figure 4 for Accelerating Training in Pommerman with Imitation and Reinforcement Learning

Share this with someone who'll enjoy it:

Abstract:The Pommerman simulation was recently developed to mimic the classic Japanese game Bomberman, and focuses on competitive gameplay in a multi-agent setting. We focus on the 2$\times$2 team version of Pommerman, developed for a competition at NeurIPS 2018. Our methodology involves training an agent initially through imitation learning on a noisy expert policy, followed by a proximal-policy optimization (PPO) reinforcement learning algorithm. The basic PPO approach is modified for stable transition from the imitation learning phase through reward shaping, action filters based on heuristics, and curriculum learning. The proposed methodology is able to beat heuristic and pure reinforcement learning baselines with a combined 100,000 training games, significantly faster than other non-tree-search methods in literature. We present results against multiple agents provided by the developers of the simulation, including some that we have enhanced. We include a sensitivity analysis over different parameters, and highlight undesirable effects of some strategies that initially appear promising. Since Pommerman is a complex multi-agent competitive environment, the strategies developed here provide insights into several real-world problems with characteristics such as partial observability, decentralized execution (without communication), and very sparse and delayed rewards.

* Presented at Deep Reinforcement Learning workshop, NeurIPS-2019

View paper on

Share this with someone who'll enjoy it:

Title:Accelerating Training in Pommerman with Imitation and Reinforcement Learning

Paper and Code