Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Larry Rudolph

Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO

May 25, 2020

Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry

Figure 1 for Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO

Figure 2 for Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO

Figure 3 for Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO

Figure 4 for Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO

Abstract:We study the roots of algorithmic progress in deep policy gradient algorithms through a case study on two popular algorithms: Proximal Policy Optimization (PPO) and Trust Region Policy Optimization (TRPO). Specifically, we investigate the consequences of "code-level optimizations:" algorithm augmentations found only in implementations or described as auxiliary details to the core algorithm. Seemingly of secondary importance, such optimizations turn out to have a major impact on agent behavior. Our results show that they (a) are responsible for most of PPO's gain in cumulative reward over TRPO, and (b) fundamentally change how RL methods function. These insights show the difficulty and importance of attributing performance gains in deep reinforcement learning. Code for reproducing our results is available at https://github.com/MadryLab/implementation-matters .

* ICLR 2020 version. arXiv admin note: text overlap with arXiv:1811.02553

Via

Access Paper or Ask Questions

Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms?

Dec 02, 2018

Andrew Ilyas, Logan Engstrom, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry Rudolph, Aleksander Madry

Figure 1 for Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms?

Figure 2 for Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms?

Figure 3 for Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms?

Figure 4 for Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms?

Abstract:We study how the behavior of deep policy gradient algorithms reflects the conceptual framework motivating their development. We propose a fine-grained analysis of state-of-the-art methods based on key aspects of this framework: gradient estimation, value prediction, optimization landscapes, and trust region enforcement. We find that from this perspective, the behavior of deep policy gradient algorithms often deviates from what their motivating framework would predict. Our analysis suggests first steps towards solidifying the foundations of these algorithms, and in particular indicates that we may need to move beyond the current benchmark-centric evaluation methodology.

Via

Access Paper or Ask Questions