Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhikang T. Wang

A Convergent and Efficient Deep Q Network Algorithm

Jun 29, 2021

Zhikang T. Wang, Masahito Ueda

Figure 1 for A Convergent and Efficient Deep Q Network Algorithm

Figure 2 for A Convergent and Efficient Deep Q Network Algorithm

Figure 3 for A Convergent and Efficient Deep Q Network Algorithm

Figure 4 for A Convergent and Efficient Deep Q Network Algorithm

Abstract:Despite the empirical success of the deep Q network (DQN) reinforcement learning algorithm and its variants, DQN is still not well understood and it does not guarantee convergence. In this work, we show that DQN can diverge and cease to operate in realistic settings. Although there exist gradient-based convergent methods, we show that they actually have inherent problems in learning behaviour and elucidate why they often fail in practice. To overcome these problems, we propose a convergent DQN algorithm (C-DQN) by carefully modifying DQN, and we show that the algorithm is convergent and can work with large discount factors (0.9998). It learns robustly in difficult settings and can learn several difficult games in the Atari 2600 benchmark where DQN fail, within a moderate computational budget. Our codes have been publicly released and can be used to reproduce our results.

Via

Access Paper or Ask Questions

LaProp: a Better Way to Combine Momentum with Adaptive Gradient

Feb 12, 2020

Liu Ziyin, Zhikang T. Wang, Masahito Ueda

Figure 1 for LaProp: a Better Way to Combine Momentum with Adaptive Gradient

Figure 2 for LaProp: a Better Way to Combine Momentum with Adaptive Gradient

Figure 3 for LaProp: a Better Way to Combine Momentum with Adaptive Gradient

Figure 4 for LaProp: a Better Way to Combine Momentum with Adaptive Gradient

Abstract:Identifying a divergence problem in Adam, we propose a new optimizer, LaProp, which belongs to the family of adaptive gradient descent methods. This method allows for greater flexibility in choosing its hyperparameters, mitigates the effort of fine tuning, and permits straightforward interpolation between the signed gradient methods and the adaptive gradient methods. We bound the regret of LaProp on a convex problem and show that our bound differs from the previous methods by a key factor, which demonstrates its advantage. We experimentally show that LaProp outperforms the previous methods on a toy task with noisy gradients, optimization of extremely deep fully-connected networks, neural art style transfer, natural language processing using transformers, and reinforcement learning with deep-Q networks. The performance improvement of LaProp is shown to be consistent, sometimes dramatic and qualitative.

Via

Access Paper or Ask Questions

Deep Reinforcement Learning Control of Quantum Cartpoles

Oct 24, 2019

Zhikang T. Wang, Yuto Ashida, Masahito Ueda

Figure 1 for Deep Reinforcement Learning Control of Quantum Cartpoles

Figure 2 for Deep Reinforcement Learning Control of Quantum Cartpoles

Figure 3 for Deep Reinforcement Learning Control of Quantum Cartpoles

Figure 4 for Deep Reinforcement Learning Control of Quantum Cartpoles

Abstract:We generalize a standard benchmark of reinforcement learning, the classical cartpole balancing problem, to the quantum regime by stabilizing a particle in an unstable potential through measurement and feedback. We use the state-of-the-art deep reinforcement learning to stabilize the quantum cartpole and find that our deep learning approach performs comparably to or better than other strategies in standard control theory. Our approach also applies to measurement-feedback cooling of quantum oscillators, showing the applicability of deep learning to general continuous-space quantum control.

* 5+3 pages, 2 figures, 2+2 tables, 5 videos at an external link

Via

Access Paper or Ask Questions