Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline

May 04, 2024

Wenjia Meng, Qian Zheng, Long Yang, Yilong Yin, Gang Pan

Figure 1 for Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline

Figure 2 for Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline

Figure 3 for Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline

Figure 4 for Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline

Share this with someone who'll enjoy it:

Abstract:Policy-based methods have achieved remarkable success in solving challenging reinforcement learning problems. Among these methods, off-policy policy gradient methods are particularly important due to that they can benefit from off-policy data. However, these methods suffer from the high variance of the off-policy policy gradient (OPPG) estimator, which results in poor sample efficiency during training. In this paper, we propose an off-policy policy gradient method with the optimal action-dependent baseline (Off-OAB) to mitigate this variance issue. Specifically, this baseline maintains the OPPG estimator's unbiasedness while theoretically minimizing its variance. To enhance practical computational efficiency, we design an approximated version of this optimal baseline. Utilizing this approximation, our method (Off-OAB) aims to decrease the OPPG estimator's variance during policy optimization. We evaluate the proposed Off-OAB method on six representative tasks from OpenAI Gym and MuJoCo, where it demonstrably surpasses state-of-the-art methods on the majority of these tasks.

* 12 pages, 3 figures

View paper on

Share this with someone who'll enjoy it:

Title:Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline

Paper and Code