Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Simon Ramstedt

Reinforcement Learning with Random Delays

Oct 08, 2020

Simon Ramstedt, Yann Bouteiller, Giovanni Beltrame, Christopher Pal, Jonathan Binas

Figure 1 for Reinforcement Learning with Random Delays

Figure 2 for Reinforcement Learning with Random Delays

Figure 3 for Reinforcement Learning with Random Delays

Figure 4 for Reinforcement Learning with Random Delays

Abstract:Action and observation delays commonly occur in many Reinforcement Learning applications, such as remote control scenarios. We study the anatomy of randomly delayed environments, and show that partially resampling trajectory fragments in hindsight allows for off-policy multi-step value estimation. We apply this principle to derive Delay-Correcting Actor-Critic (DCAC), an algorithm based on Soft Actor-Critic with significantly better performance in environments with delays. This is shown theoretically and also demonstrated practically on a delay-augmented version of the MuJoCo continuous control benchmark.

Via

Access Paper or Ask Questions

Real-Time Reinforcement Learning

Dec 12, 2019

Simon Ramstedt, Christopher Pal

Figure 1 for Real-Time Reinforcement Learning

Figure 2 for Real-Time Reinforcement Learning

Figure 3 for Real-Time Reinforcement Learning

Figure 4 for Real-Time Reinforcement Learning

Abstract:Markov Decision Processes (MDPs), the mathematical framework underlying most algorithms in Reinforcement Learning (RL), are often used in a way that wrongfully assumes that the state of an agent's environment does not change during action selection. As RL systems based on MDPs begin to find application in real-world safety critical situations, this mismatch between the assumptions underlying classical MDPs and the reality of real-time computation may lead to undesirable outcomes. In this paper, we introduce a new framework, in which states and actions evolve simultaneously and show how it is related to the classical MDP formulation. We analyze existing algorithms under the new real-time formulation and show why they are suboptimal when used in real-time. We then use those insights to create a new algorithm Real-Time Actor-Critic (RTAC) that outperforms the existing state-of-the-art continuous control algorithm Soft Actor-Critic both in real-time and non-real-time settings. Code and videos can be found at https://github.com/rmst/rtrl.

* Neural Information Processing Systems (2019)

Via

Access Paper or Ask Questions