Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Generalized Proximal Policy Optimization with Sample Reuse

Oct 29, 2021

James Queeney, Ioannis Ch. Paschalidis, Christos G. Cassandras

Figure 1 for Generalized Proximal Policy Optimization with Sample Reuse

Figure 2 for Generalized Proximal Policy Optimization with Sample Reuse

Figure 3 for Generalized Proximal Policy Optimization with Sample Reuse

Figure 4 for Generalized Proximal Policy Optimization with Sample Reuse

Share this with someone who'll enjoy it:

Abstract:In real-world decision making tasks, it is critical for data-driven reinforcement learning methods to be both stable and sample efficient. On-policy methods typically generate reliable policy improvement throughout training, while off-policy methods make more efficient use of data through sample reuse. In this work, we combine the theoretically supported stability benefits of on-policy algorithms with the sample efficiency of off-policy algorithms. We develop policy improvement guarantees that are suitable for the off-policy setting, and connect these bounds to the clipping mechanism used in Proximal Policy Optimization. This motivates an off-policy version of the popular algorithm that we call Generalized Proximal Policy Optimization with Sample Reuse. We demonstrate both theoretically and empirically that our algorithm delivers improved performance by effectively balancing the competing goals of stability and sample efficiency.

* To appear in 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Generalized Proximal Policy Optimization with Sample Reuse

Paper and Code