Abstract:Randomized ensemble double Q-learning (REDQ) has recently achieved state-of-the-art sample efficiency on continuous-action reinforcement learning benchmarks. This superior sample efficiency is possible by using a large Q-function ensemble. However, REDQ is much less computationally efficient than non-ensemble counterparts such as Soft Actor-Critic (SAC). To make REDQ more computationally efficient, we propose a method of improving computational efficiency called Dr.Q, which is a variant of REDQ that uses a small ensemble of dropout Q-functions. Our dropout Q-functions are simple Q-functions equipped with dropout connection and layer normalization. Despite its simplicity of implementation, our experimental results indicate that Dr.Q is doubly (sample and computationally) efficient. It achieved comparable sample efficiency with REDQ and much better computational efficiency than REDQ and comparable computational efficiency with that of SAC.
Abstract:In many deep reinforcement learning settings, when an agent takes an action, it repeats the same action a predefined number of times without observing the states until the next action-decision point. This technique of action repetition has several merits in training the agent, but the data between action-decision points (i.e., intermediate frames) are, in effect, discarded. Since the amount of training data is inversely proportional to the interval of action repeats, they can have a negative impact on the sample efficiency of training. In this paper, we propose a simple but effective approach to alleviate to this problem by introducing the concept of pseudo-actions. The key idea of our method is making the transition between action-decision points usable as training data by considering pseudo-actions. Pseudo-actions for continuous control tasks are obtained as the average of the action sequence straddling an action-decision point. For discrete control tasks, pseudo-actions are computed from learned action embeddings. This method can be combined with any model-free reinforcement learning algorithm that involves the learning of Q-functions. We demonstrate the effectiveness of our approach on both continuous and discrete control tasks in OpenAI Gym.