Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jianshu Wang

Proximal Policy Optimization with Mixed Distributed Training

Sep 08, 2019

Zhenyu Zhang, Xiangfeng Luo, Tong Liu, Shaorong Xie, Jianshu Wang, Wei Wang, Yang Li, Yan Peng

Figure 1 for Proximal Policy Optimization with Mixed Distributed Training

Figure 2 for Proximal Policy Optimization with Mixed Distributed Training

Figure 3 for Proximal Policy Optimization with Mixed Distributed Training

Figure 4 for Proximal Policy Optimization with Mixed Distributed Training

Abstract:Instability and slowness are two main problems in deep reinforcement learning. Even if proximal policy optimization (PPO) is the state of the art, it still suffers from these two problems. We introduce an improved algorithm based on proximal policy optimization, mixed distributed proximal policy optimization (MDPPO), and show that it can accelerate and stabilize the training process. In our algorithm, multiple different policies train simultaneously and each of them controls several identical agents that interact with environments. Actions are sampled by each policy separately as usual, but the trajectories for the training process are collected from all agents, instead of only one policy. We find that if we choose some auxiliary trajectories elaborately to train policies, the algorithm will be more stable and quicker to converge especially in the environments with sparse rewards.

* ICTAI 2019

Via

Access Paper or Ask Questions