Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Optimistic Exploration even with a Pessimistic Initialisation

Feb 26, 2020

Tabish Rashid, Bei Peng, Wendelin Böhmer, Shimon Whiteson

Figure 1 for Optimistic Exploration even with a Pessimistic Initialisation

Figure 2 for Optimistic Exploration even with a Pessimistic Initialisation

Figure 3 for Optimistic Exploration even with a Pessimistic Initialisation

Figure 4 for Optimistic Exploration even with a Pessimistic Initialisation

Share this with someone who'll enjoy it:

Abstract:Optimistic initialisation is an effective strategy for efficient exploration in reinforcement learning (RL). In the tabular case, all provably efficient model-free algorithms rely on it. However, model-free deep RL algorithms do not use optimistic initialisation despite taking inspiration from these provably efficient tabular algorithms. In particular, in scenarios with only positive rewards, Q-values are initialised at their lowest possible values due to commonly used network initialisation schemes, a pessimistic initialisation. Merely initialising the network to output optimistic Q-values is not enough, since we cannot ensure that they remain optimistic for novel state-action pairs, which is crucial for exploration. We propose a simple count-based augmentation to pessimistically initialised Q-values that separates the source of optimism from the neural network. We show that this scheme is provably efficient in the tabular setting and extend it to the deep RL setting. Our algorithm, Optimistic Pessimistically Initialised Q-Learning (OPIQ), augments the Q-value estimates of a DQN-based agent with count-derived bonuses to ensure optimism during both action selection and bootstrapping. We show that OPIQ outperforms non-optimistic DQN variants that utilise a pseudocount-based intrinsic motivation in hard exploration tasks, and that it predicts optimistic estimates for novel state-action pairs.

* Published as a conference paper at ICLR 2020

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Optimistic Exploration even with a Pessimistic Initialisation

Paper and Code