Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Arsenii Kuznetsov

Adapting Double Q-Learning for Continuous Reinforcement Learning

Sep 25, 2023

Arsenii Kuznetsov

Abstract:Majority of off-policy reinforcement learning algorithms use overestimation bias control techniques. Most of these techniques rooted in heuristics, primarily addressing the consequences of overestimation rather than its fundamental origins. In this work we present a novel approach to the bias correction, similar in spirit to Double Q-Learning. We propose using a policy in form of a mixture with two components. Each policy component is maximized and assessed by separate networks, which removes any basis for the overestimation bias. Our approach shows promising near-SOTA results on a small set of MuJoCo environments.

Via

Access Paper or Ask Questions

Automating Control of Overestimation Bias for Continuous Reinforcement Learning

Oct 26, 2021

Arsenii Kuznetsov, Alexander Grishin, Artem Tsypin, Arsenii Ashukha, Dmitry Vetrov

Figure 1 for Automating Control of Overestimation Bias for Continuous Reinforcement Learning

Figure 2 for Automating Control of Overestimation Bias for Continuous Reinforcement Learning

Figure 3 for Automating Control of Overestimation Bias for Continuous Reinforcement Learning

Figure 4 for Automating Control of Overestimation Bias for Continuous Reinforcement Learning

Abstract:Bias correction techniques are used by most of the high-performing methods for off-policy reinforcement learning. However, these techniques rely on a pre-defined bias correction policy that is either not flexible enough or requires environment-specific tuning of hyperparameters. In this work, we present a simple data-driven approach for guiding bias correction. We demonstrate its effectiveness on the Truncated Quantile Critics -- a state-of-the-art continuous control algorithm. The proposed technique can adjust the bias correction across environments automatically. As a result, it eliminates the need for an extensive hyperparameter search, significantly reducing the actual number of interactions and computation.

Via

Access Paper or Ask Questions

Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics

May 08, 2020

Arsenii Kuznetsov, Pavel Shvechikov, Alexander Grishin, Dmitry Vetrov

Figure 1 for Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics

Figure 2 for Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics

Figure 3 for Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics

Figure 4 for Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics

Abstract:The overestimation bias is one of the major impediments to accurate off-policy learning. This paper investigates a novel way to alleviate the overestimation bias in a continuous control setting. Our method---Truncated Quantile Critics, TQC,---blends three ideas: distributional representation of a critic, truncation of critics prediction, and ensembling of multiple critics. Distributional representation and truncation allow for arbitrary granular overestimation control, while ensembling provides additional score improvements. TQC outperforms the current state of the art on all environments from the continuous control benchmark suite, demonstrating 25% improvement on the most challenging Humanoid environment.

* Under review by the International Conference on Machine Learning

Via

Access Paper or Ask Questions