Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:A functional mirror ascent view of policy gradient methods with function approximation

Aug 12, 2021

Sharan Vaswani, Olivier Bachem, Simone Totaro, Robert Mueller, Matthieu Geist, Marlos C. Machado, Pablo Samuel Castro, Nicolas Le Roux

Figure 1 for A functional mirror ascent view of policy gradient methods with function approximation

Figure 2 for A functional mirror ascent view of policy gradient methods with function approximation

Figure 3 for A functional mirror ascent view of policy gradient methods with function approximation

Figure 4 for A functional mirror ascent view of policy gradient methods with function approximation

Share this with someone who'll enjoy it:

Abstract:We use functional mirror ascent to propose a general framework (referred to as FMA-PG) for designing policy gradient methods. The functional perspective distinguishes between a policy's functional representation (what are its sufficient statistics) and its parameterization (how are these statistics represented) and naturally results in computationally efficient off-policy updates. For simple policy parameterizations, the FMA-PG framework ensures that the optimal policy is a fixed point of the updates. It also allows us to handle complex policy parameterizations (e.g., neural networks) while guaranteeing policy improvement. Our framework unifies several PG methods and opens the way for designing sample-efficient variants of existing methods. Moreover, it recovers important implementation heuristics (e.g., using forward vs reverse KL divergence) in a principled way. With a softmax functional representation, FMA-PG results in a variant of TRPO with additional desirable properties. It also suggests an improved variant of PPO, whose robustness and efficiency we empirically demonstrate on MuJoCo. Via experiments on simple reinforcement learning problems, we evaluate algorithms instantiated by FMA-PG.

View paper on

Share this with someone who'll enjoy it:

Title:A functional mirror ascent view of policy gradient methods with function approximation

Paper and Code