Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:TOPS: Transition-based VOlatility-controlled Policy Search and its Global Convergence

Jan 24, 2022

Liangliang Xu, Aiwen Jiang, Daoming Lyu, Bo Liu

Figure 1 for TOPS: Transition-based VOlatility-controlled Policy Search and its Global Convergence

Figure 2 for TOPS: Transition-based VOlatility-controlled Policy Search and its Global Convergence

Figure 3 for TOPS: Transition-based VOlatility-controlled Policy Search and its Global Convergence

Figure 4 for TOPS: Transition-based VOlatility-controlled Policy Search and its Global Convergence

Share this with someone who'll enjoy it:

Abstract:Risk-averse problems receive far less attention than risk-neutral control problems in reinforcement learning, and existing risk-averse approaches are challenging to deploy to real-world applications. One primary reason is that such risk-averse algorithms often learn from consecutive trajectories with a certain length, which significantly increases the potential danger of causing dangerous failures in practice. This paper proposes Transition-based VOlatility-controlled Policy Search (TOPS), a novel algorithm that solves risk-averse problems by learning from (possibly non-consecutive) transitions instead of only consecutive trajectories. By using an actor-critic scheme with an overparameterized two-layer neural network, our algorithm finds a globally optimal policy at a sublinear rate with proximal policy optimization and natural policy gradient, with effectiveness comparable to the state-of-the-art convergence rate of risk-neutral policy-search methods. The algorithm is evaluated on challenging Mujoco robot simulation tasks under the mean-variance evaluation metric. Both theoretical analysis and experimental results demonstrate a state-of-the-art level of risk-averse policy search methods.

View paper on

Share this with someone who'll enjoy it:

Title:TOPS: Transition-based VOlatility-controlled Policy Search and its Global Convergence

Paper and Code