Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Coordinate-wise Control Variates for Deep Policy Gradients

Aug 11, 2021

Yuanyi Zhong, Yuan Zhou, Jian Peng

Figure 1 for Coordinate-wise Control Variates for Deep Policy Gradients

Figure 2 for Coordinate-wise Control Variates for Deep Policy Gradients

Figure 3 for Coordinate-wise Control Variates for Deep Policy Gradients

Figure 4 for Coordinate-wise Control Variates for Deep Policy Gradients

Share this with someone who'll enjoy it:

Abstract:The control variates (CV) method is widely used in policy gradient estimation to reduce the variance of the gradient estimators in practice. A control variate is applied by subtracting a baseline function from the state-action value estimates. Then the variance-reduced policy gradient presumably leads to higher learning efficiency. Recent research on control variates with deep neural net policies mainly focuses on scalar-valued baseline functions. The effect of vector-valued baselines is under-explored. This paper investigates variance reduction with coordinate-wise and layer-wise control variates constructed from vector-valued baselines for neural net policies. We present experimental evidence suggesting that lower variance can be obtained with such baselines than with the conventional scalar-valued baseline. We demonstrate how to equip the popular Proximal Policy Optimization (PPO) algorithm with these new control variates. We show that the resulting algorithm with proper regularization can achieve higher sample efficiency than scalar control variates in continuous control benchmarks.

* 14 pages, 3 figures, added references compared to v1

View paper on

Share this with someone who'll enjoy it:

Title:Coordinate-wise Control Variates for Deep Policy Gradients

Paper and Code