Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Remi Leblond

Options as responses: Grounding behavioural hierarchies in multi-agent RL

Jun 06, 2019

Alexander Sasha Vezhnevets, Yuhuai Wu, Remi Leblond, Joel Z. Leibo

Figure 1 for Options as responses: Grounding behavioural hierarchies in multi-agent RL

Figure 2 for Options as responses: Grounding behavioural hierarchies in multi-agent RL

Figure 3 for Options as responses: Grounding behavioural hierarchies in multi-agent RL

Figure 4 for Options as responses: Grounding behavioural hierarchies in multi-agent RL

Abstract:We propose a novel hierarchical agent architecture for multi-agent reinforcement learning with concealed information. The hierarchy is grounded in the concealed information about other players, which resolves "the chicken or the egg" nature of option discovery. We factorise the value function over a latent representation of the concealed information and then re-use this latent space to factorise the policy into options. Low-level policies (options) are trained to respond to particular states of other agents grouped by the latent representation, while the top level (meta-policy) learns to infer the latent representation from its own observation thereby to select the right option. This grounding facilitates credit assignment across the levels of hierarchy. We show that this helps generalisation---performance against a held-out set of pre-trained competitors, while training in self- or population-play---and resolution of social dilemmas in self-play.

* First two authors contributed equally

Via

Access Paper or Ask Questions