Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ermo Wei

Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space

Oct 23, 2018

Ermo Wei, Drew Wicke, Sean Luke

Figure 1 for Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space

Figure 2 for Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space

Figure 3 for Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space

Figure 4 for Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space

Abstract:We explore Deep Reinforcement Learning in a parameterized action space. Specifically, we investigate how to achieve sample-efficient end-to-end training in these tasks. We propose a new compact architecture for the tasks where the parameter policy is conditioned on the output of the discrete action policy. We also propose two new methods based on the state-of-the-art algorithms Trust Region Policy Optimization (TRPO) and Stochastic Value Gradient (SVG) to train such an architecture. We demonstrate that these methods outperform the state of the art method, Parameterized Action DDPG, on test domains.

* Accepted in AAAI 18 Spring Symposium

Via

Access Paper or Ask Questions

Multiagent Soft Q-Learning

Apr 25, 2018

Ermo Wei, Drew Wicke, David Freelan, Sean Luke

Abstract:Policy gradient methods are often applied to reinforcement learning in continuous multiagent games. These methods perform local search in the joint-action space, and as we show, they are susceptable to a game-theoretic pathology known as relative overgeneralization. To resolve this issue, we propose Multiagent Soft Q-learning, which can be seen as the analogue of applying Q-learning to continuous controls. We compare our method to MADDPG, a state-of-the-art approach, and show that our method achieves better coordination in multiagent cooperative tasks, converging to better local optima in the joint action space.

* Accepted in AAAI 18 Spring Symposium

Via

Access Paper or Ask Questions