Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Taisei Hashimoto

Dropout Q-Functions for Doubly Efficient Reinforcement Learning

Oct 05, 2021

Takuya Hiraoka, Takahisa Imagawa, Taisei Hashimoto, Takashi Onishi, Yoshimasa Tsuruoka

Figure 1 for Dropout Q-Functions for Doubly Efficient Reinforcement Learning

Figure 2 for Dropout Q-Functions for Doubly Efficient Reinforcement Learning

Figure 3 for Dropout Q-Functions for Doubly Efficient Reinforcement Learning

Figure 4 for Dropout Q-Functions for Doubly Efficient Reinforcement Learning

Abstract:Randomized ensemble double Q-learning (REDQ) has recently achieved state-of-the-art sample efficiency on continuous-action reinforcement learning benchmarks. This superior sample efficiency is possible by using a large Q-function ensemble. However, REDQ is much less computationally efficient than non-ensemble counterparts such as Soft Actor-Critic (SAC). To make REDQ more computationally efficient, we propose a method of improving computational efficiency called Dr.Q, which is a variant of REDQ that uses a small ensemble of dropout Q-functions. Our dropout Q-functions are simple Q-functions equipped with dropout connection and layer normalization. Despite its simplicity of implementation, our experimental results indicate that Dr.Q is doubly (sample and computationally) efficient. It achieved comparable sample efficiency with REDQ and much better computational efficiency than REDQ and comparable computational efficiency with that of SAC.

* Source code: https://github.com/TakuyaHiraoka/Dropout-Q-Functions-for-Doubly-Efficient-Reinforcement-Learning

Via

Access Paper or Ask Questions

Utilizing Skipped Frames in Action Repeats via Pseudo-Actions

May 07, 2021

Taisei Hashimoto, Yoshimasa Tsuruoka

Figure 1 for Utilizing Skipped Frames in Action Repeats via Pseudo-Actions

Figure 2 for Utilizing Skipped Frames in Action Repeats via Pseudo-Actions

Figure 3 for Utilizing Skipped Frames in Action Repeats via Pseudo-Actions

Figure 4 for Utilizing Skipped Frames in Action Repeats via Pseudo-Actions

Abstract:In many deep reinforcement learning settings, when an agent takes an action, it repeats the same action a predefined number of times without observing the states until the next action-decision point. This technique of action repetition has several merits in training the agent, but the data between action-decision points (i.e., intermediate frames) are, in effect, discarded. Since the amount of training data is inversely proportional to the interval of action repeats, they can have a negative impact on the sample efficiency of training. In this paper, we propose a simple but effective approach to alleviate to this problem by introducing the concept of pseudo-actions. The key idea of our method is making the transition between action-decision points usable as training data by considering pseudo-actions. Pseudo-actions for continuous control tasks are obtained as the average of the action sequence straddling an action-decision point. For discrete control tasks, pseudo-actions are computed from learned action embeddings. This method can be combined with any model-free reinforcement learning algorithm that involves the learning of Q-functions. We demonstrate the effectiveness of our approach on both continuous and discrete control tasks in OpenAI Gym.

* Deep Reinforcement Learning Workshop, NeurIPS 2020

Via

Access Paper or Ask Questions