Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Igor Kuznetsov

Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization

Jun 25, 2022

Igor Kuznetsov

Figure 1 for Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization

Figure 2 for Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization

Figure 3 for Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization

Figure 4 for Guided Exploration in Reinforcement Learning via Monte Carlo Critic Optimization

Abstract:The class of deep deterministic off-policy algorithms is effectively applied to solve challenging continuous control problems. However, current approaches use random noise as a common exploration method that has several weaknesses, such as a need for manual adjusting on a given task and the absence of exploratory calibration during the training process. We address these challenges by proposing a novel guided exploration method that uses a differential directional controller to incorporate scalable exploratory action correction. An ensemble of Monte Carlo Critics that provides exploratory direction is presented as a controller. The proposed method improves the traditional exploration scheme by changing exploration dynamically. We then present a novel algorithm exploiting the proposed directional controller for both policy and critic modification. The presented algorithm outperforms modern reinforcement learning algorithms across a variety of problems from DMControl suite.

* Accepted at Decision Awareness in Reinforcement Learning Workshop, ICML 2022

Via

Access Paper or Ask Questions

Solving Continuous Control with Episodic Memory

Jun 16, 2021

Igor Kuznetsov, Andrey Filchenkov

Figure 1 for Solving Continuous Control with Episodic Memory

Figure 2 for Solving Continuous Control with Episodic Memory

Figure 3 for Solving Continuous Control with Episodic Memory

Figure 4 for Solving Continuous Control with Episodic Memory

Abstract:Episodic memory lets reinforcement learning algorithms remember and exploit promising experience from the past to improve agent performance. Previous works on memory mechanisms show benefits of using episodic-based data structures for discrete action problems in terms of sample-efficiency. The application of episodic memory for continuous control with a large action space is not trivial. Our study aims to answer the question: can episodic memory be used to improve agent's performance in continuous control? Our proposed algorithm combines episodic memory with Actor-Critic architecture by modifying critic's objective. We further improve performance by introducing episodic-based replay buffer prioritization. We evaluate our algorithm on OpenAI gym domains and show greater sample-efficiency compared with the state-of-the art model-free off-policy algorithms.

* To appear in the 30th International Joint Conference on Artificial Intelligence (IJCAI 2021)

Via

Access Paper or Ask Questions

Jacobian Policy Optimizations

Jun 13, 2019

Arip Asadulaev, Gideon Stein, Igor Kuznetsov, Andrey Filchenkov

Figure 1 for Jacobian Policy Optimizations

Figure 2 for Jacobian Policy Optimizations

Figure 3 for Jacobian Policy Optimizations

Figure 4 for Jacobian Policy Optimizations

Abstract:Recently, natural policy gradient algorithms gained widespread recognition due to their strong performance in reinforcement learning tasks. However, their major drawback is the need to secure the policy being in a ``trust region'' and meanwhile allowing for sufficient exploration. The main objective of this study was to present an approach which models dynamical isometry of agents policies by estimating conditioning of its Jacobian at individual points in the environment space. We present a Jacobian Policy Optimization algorithm for policy optimization, which dynamically adapts the trust interval with respect to policy conditioning. The suggested approach was tested across a range of Atari environments. This paper offers some important insights into an improvement of policy optimization in reinforcement learning tasks.

Via

Access Paper or Ask Questions

Linear Distillation Learning

Jun 13, 2019

Arip Asadulaev, Igor Kuznetsov, Andrey Filchenkov

Figure 1 for Linear Distillation Learning

Figure 2 for Linear Distillation Learning

Figure 3 for Linear Distillation Learning

Figure 4 for Linear Distillation Learning

Abstract:Deep Linear Networks do not have expressive power but they are mathematically tractable. In our work, we found an architecture in which they are expressive. This paper presents a Linear Distillation Learning (LDL) a simple remedy to improve the performance of linear networks through distillation. In deep learning models, distillation often allows the smaller/shallow network to mimic the larger models in a much more accurate way, while a network of the same size trained on the one-hot targets can't achieve comparable results to the cumbersome model. In our method, we train students to distill teacher separately for each class in dataset. The most striking result to emerge from the data is that neural networks without activation functions can achieve high classification score on a small amount of data on MNIST and Omniglot datasets. Due to tractability, linear networks can be used to explain some phenomena observed experimentally in deep non-linear networks. The suggested approach could become a simple and practical instrument while further studies in the field of linear networks and distillation are yet to be undertaken.

Via

Access Paper or Ask Questions