Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Matthew J. Sargent

Frequency and Generalisation of Periodic Activation Functions in Reinforcement Learning

Jul 09, 2024

Augustine N. Mavor-Parker, Matthew J. Sargent, Caswell Barry, Lewis Griffin, Clare Lyle

Figure 1 for Frequency and Generalisation of Periodic Activation Functions in Reinforcement Learning

Figure 2 for Frequency and Generalisation of Periodic Activation Functions in Reinforcement Learning

Figure 3 for Frequency and Generalisation of Periodic Activation Functions in Reinforcement Learning

Figure 4 for Frequency and Generalisation of Periodic Activation Functions in Reinforcement Learning

Abstract:Periodic activation functions, often referred to as learned Fourier features have been widely demonstrated to improve sample efficiency and stability in a variety of deep RL algorithms. Potentially incompatible hypotheses have been made about the source of these improvements. One is that periodic activations learn low frequency representations and as a result avoid overfitting to bootstrapped targets. Another is that periodic activations learn high frequency representations that are more expressive, allowing networks to quickly fit complex value functions. We analyse these claims empirically, finding that periodic representations consistently converge to high frequencies regardless of their initialisation frequency. We also find that while periodic activation functions improve sample efficiency, they exhibit worse generalization on states with added observation noise -- especially when compared to otherwise equivalent networks with ReLU activation functions. Finally, we show that weight decay regularization is able to partially offset the overfitting of periodic activation functions, delivering value functions that learn quickly while also generalizing.

Via

Access Paper or Ask Questions

Temporally Extended Successor Representations

Sep 25, 2022

Matthew J. Sargent, Peter J. Bentley, Caswell Barry, William de Cothi

Figure 1 for Temporally Extended Successor Representations

Figure 2 for Temporally Extended Successor Representations

Figure 3 for Temporally Extended Successor Representations

Figure 4 for Temporally Extended Successor Representations

Abstract:We present a temporally extended variation of the successor representation, which we term t-SR. t-SR captures the expected state transition dynamics of temporally extended actions by constructing successor representations over primitive action repeats. This form of temporal abstraction does not learn a top-down hierarchy of pertinent task structures, but rather a bottom-up composition of coupled actions and action repetitions. This lessens the amount of decisions required in control without learning a hierarchical policy. As such, t-SR directly considers the time horizon of temporally extended action sequences without the need for predefined or domain-specific options. We show that in environments with dynamic reward structure, t-SR is able to leverage both the flexibility of the successor representation and the abstraction afforded by temporally extended actions. Thus, in a series of sparsely rewarded gridworld environments, t-SR optimally adapts learnt policies far faster than comparable value-based, model-free reinforcement learning methods. We also show that the manner in which t-SR learns to solve these tasks requires the learnt policy to be sampled consistently less often than non-temporally extended policies.

* Presented at the 5th Multi-Disciplinary Conference on Reinforcement Learning and Decision Making (RLDM) 2022

Via

Access Paper or Ask Questions