Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:VDSC: Enhancing Exploration Timing with Value Discrepancy and State Counts

Mar 26, 2024

Marius Captari, Remo Sasso, Matthia Sabatelli

Figure 1 for VDSC: Enhancing Exploration Timing with Value Discrepancy and State Counts

Figure 2 for VDSC: Enhancing Exploration Timing with Value Discrepancy and State Counts

Figure 3 for VDSC: Enhancing Exploration Timing with Value Discrepancy and State Counts

Figure 4 for VDSC: Enhancing Exploration Timing with Value Discrepancy and State Counts

Share this with someone who'll enjoy it:

Abstract:Despite the considerable attention given to the questions of \textit{how much} and \textit{how to} explore in deep reinforcement learning, the investigation into \textit{when} to explore remains relatively less researched. While more sophisticated exploration strategies can excel in specific, often sparse reward environments, existing simpler approaches, such as $\epsilon$-greedy, persist in outperforming them across a broader spectrum of domains. The appeal of these simpler strategies lies in their ease of implementation and generality across a wide range of domains. The downside is that these methods are essentially a blind switching mechanism, which completely disregards the agent's internal state. In this paper, we propose to leverage the agent's internal state to decide \textit{when} to explore, addressing the shortcomings of blind switching mechanisms. We present Value Discrepancy and State Counts through homeostasis (VDSC), a novel approach for efficient exploration timing. Experimental results on the Atari suite demonstrate the superiority of our strategy over traditional methods such as $\epsilon$-greedy and Boltzmann, as well as more sophisticated techniques like Noisy Nets.

View paper on

Share this with someone who'll enjoy it:

Title:VDSC: Enhancing Exploration Timing with Value Discrepancy and State Counts

Paper and Code