Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Policy Gradient with Active Importance Sampling

May 09, 2024

Matteo Papini, Giorgio Manganini, Alberto Maria Metelli, Marcello Restelli

Figure 1 for Policy Gradient with Active Importance Sampling

Figure 2 for Policy Gradient with Active Importance Sampling

Figure 3 for Policy Gradient with Active Importance Sampling

Figure 4 for Policy Gradient with Active Importance Sampling

Share this with someone who'll enjoy it:

Abstract:Importance sampling (IS) represents a fundamental technique for a large surge of off-policy reinforcement learning approaches. Policy gradient (PG) methods, in particular, significantly benefit from IS, enabling the effective reuse of previously collected samples, thus increasing sample efficiency. However, classically, IS is employed in RL as a passive tool for re-weighting historical samples. However, the statistical community employs IS as an active tool combined with the use of behavioral distributions that allow the reduction of the estimate variance even below the sample mean one. In this paper, we focus on this second setting by addressing the behavioral policy optimization (BPO) problem. We look for the best behavioral policy from which to collect samples to reduce the policy gradient variance as much as possible. We provide an iterative algorithm that alternates between the cross-entropy estimation of the minimum-variance behavioral policy and the actual policy optimization, leveraging on defensive IS. We theoretically analyze such an algorithm, showing that it enjoys a convergence rate of order $O(\epsilon^{-4})$ to a stationary point, but depending on a more convenient variance term w.r.t. standard PG methods. We then provide a practical version that is numerically validated, showing the advantages in the policy gradient estimation variance and on the learning speed.

View paper on

Share this with someone who'll enjoy it:

Title:Policy Gradient with Active Importance Sampling

Paper and Code