Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Learning in complex action spaces without policy gradients

Oct 08, 2024

Arash Tavakoli, Sina Ghiassian, Nemanja Rakićević

Figure 1 for Learning in complex action spaces without policy gradients

Figure 2 for Learning in complex action spaces without policy gradients

Figure 3 for Learning in complex action spaces without policy gradients

Figure 4 for Learning in complex action spaces without policy gradients

Share this with someone who'll enjoy it:

Abstract:Conventional wisdom suggests that policy gradient methods are better suited to complex action spaces than action-value methods. However, foundational studies have shown equivalences between these paradigms in small and finite action spaces (O'Donoghue et al., 2017; Schulman et al., 2017a). This raises the question of why their computational applicability and performance diverge as the complexity of the action space increases. We hypothesize that the apparent superiority of policy gradients in such settings stems not from intrinsic qualities of the paradigm, but from universal principles that can also be applied to action-value methods to serve similar functionality. We identify three such principles and provide a framework for incorporating them into action-value methods. To support our hypothesis, we instantiate this framework in what we term QMLE, for Q-learning with maximum likelihood estimation. Our results show that QMLE can be applied to complex action spaces with a controllable computational cost that is comparable to that of policy gradient methods, all without using policy gradients. Furthermore, QMLE demonstrates strong performance on the DeepMind Control Suite, even when compared to the state-of-the-art methods such as DMPO and D4PG.

View paper on

Share this with someone who'll enjoy it:

Title:Learning in complex action spaces without policy gradients

Paper and Code