Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Satisficing Exploration for Deep Reinforcement Learning

Jul 16, 2024

Dilip Arumugam, Saurabh Kumar, Ramki Gummadi, Benjamin Van Roy

Figure 1 for Satisficing Exploration for Deep Reinforcement Learning

Figure 2 for Satisficing Exploration for Deep Reinforcement Learning

Figure 3 for Satisficing Exploration for Deep Reinforcement Learning

Figure 4 for Satisficing Exploration for Deep Reinforcement Learning

Share this with someone who'll enjoy it:

Abstract:A default assumption in the design of reinforcement-learning algorithms is that a decision-making agent always explores to learn optimal behavior. In sufficiently complex environments that approach the vastness and scale of the real world, however, attaining optimal performance may in fact be an entirely intractable endeavor and an agent may seldom find itself in a position to complete the requisite exploration for identifying an optimal policy. Recent work has leveraged tools from information theory to design agents that deliberately forgo optimal solutions in favor of sufficiently-satisfying or satisficing solutions, obtained through lossy compression. Notably, such agents may employ fundamentally different exploratory decisions to learn satisficing behaviors more efficiently than optimal ones that are more data intensive. While supported by a rigorous corroborating theory, the underlying algorithm relies on model-based planning, drastically limiting the compatibility of these ideas with function approximation and high-dimensional observations. In this work, we remedy this issue by extending an agent that directly represents uncertainty over the optimal value function allowing it to both bypass the need for model-based planning and to learn satisficing policies. We provide simple yet illustrative experiments that demonstrate how our algorithm enables deep reinforcement-learning agents to achieve satisficing behaviors. In keeping with previous work on this setting for multi-armed bandits, we additionally find that our algorithm is capable of synthesizing optimal behaviors, when feasible, more efficiently than its non-information-theoretic counterpart.

* Accepted to the Finding the Frame Workshop at RLC 2024

View paper on

Share this with someone who'll enjoy it:

Title:Satisficing Exploration for Deep Reinforcement Learning

Paper and Code