Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mahdi Kallel

Augmented Bayesian Policy Search

Jul 05, 2024

Mahdi Kallel, Debabrota Basu, Riad Akrour, Carlo D'Eramo

Figure 1 for Augmented Bayesian Policy Search

Figure 2 for Augmented Bayesian Policy Search

Figure 3 for Augmented Bayesian Policy Search

Figure 4 for Augmented Bayesian Policy Search

Abstract:Deterministic policies are often preferred over stochastic ones when implemented on physical systems. They can prevent erratic and harmful behaviors while being easier to implement and interpret. However, in practice, exploration is largely performed by stochastic policies. First-order Bayesian Optimization (BO) methods offer a principled way of performing exploration using deterministic policies. This is done through a learned probabilistic model of the objective function and its gradient. Nonetheless, such approaches treat policy search as a black-box problem, and thus, neglect the reinforcement learning nature of the problem. In this work, we leverage the performance difference lemma to introduce a novel mean function for the probabilistic model. This results in augmenting BO methods with the action-value function. Hence, we call our method Augmented Bayesian Search~(ABS). Interestingly, this new mean function enhances the posterior gradient with the deterministic policy gradient, effectively bridging the gap between BO and policy gradient methods. The resulting algorithm combines the convenience of the direct policy search with the scalability of reinforcement learning. We validate ABS on high-dimensional locomotion problems and demonstrate competitive performance compared to existing direct policy search schemes.

* Accepted to the International Conference on Learning Representations (ICLR) 2024

Via

Access Paper or Ask Questions