Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Cameron Allen

Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy

Jul 10, 2024

Cameron Allen, Aaron Kirtland, Ruo Yu Tao, Sam Lobel, Daniel Scott, Nicholas Petrocelli, Omer Gottesman, Ronald Parr, Michael L. Littman, George Konidaris

Abstract:Reinforcement learning algorithms typically rely on the assumption that the environment dynamics and value function can be expressed in terms of a Markovian state representation. However, when state information is only partially observable, how can an agent learn such a state representation, and how can it detect when it has found one? We introduce a metric that can accomplish both objectives, without requiring access to--or knowledge of--an underlying, unobservable state space. Our metric, the $\lambda$-discrepancy, is the difference between two distinct temporal difference (TD) value estimates, each computed using TD($\lambda$) with a different value of $\lambda$. Since TD($\lambda$=0) makes an implicit Markov assumption and TD($\lambda$=1) does not, a discrepancy between these estimates is a potential indicator of a non-Markovian state representation. Indeed, we prove that the $\lambda$-discrepancy is exactly zero for all Markov decision processes and almost always non-zero for a broad class of partially observable environments. We also demonstrate empirically that, once detected, minimizing the $\lambda$-discrepancy can help with learning a memory function to mitigate the corresponding partial observability. We then train a reinforcement learning agent that simultaneously constructs two recurrent value networks with different $\lambda$ parameters and minimizes the difference between them as an auxiliary loss. The approach scales to challenging partially observable domains, where the resulting agent frequently performs significantly better (and never performs worse) than a baseline recurrent agent with only a single value network.

* GitHub URL: https://github.com/brownirl/lambda_discrepancy

Via

Access Paper or Ask Questions

Evidence of Learned Look-Ahead in a Chess-Playing Neural Network

Jun 02, 2024

Erik Jenner, Shreyas Kapur, Vasil Georgiev, Cameron Allen, Scott Emmons, Stuart Russell

Figure 1 for Evidence of Learned Look-Ahead in a Chess-Playing Neural Network

Figure 2 for Evidence of Learned Look-Ahead in a Chess-Playing Neural Network

Figure 3 for Evidence of Learned Look-Ahead in a Chess-Playing Neural Network

Figure 4 for Evidence of Learned Look-Ahead in a Chess-Playing Neural Network

Abstract:Do neural networks learn to implement algorithms such as look-ahead or search "in the wild"? Or do they rely purely on collections of simple heuristics? We present evidence of learned look-ahead in the policy network of Leela Chess Zero, the currently strongest neural chess engine. We find that Leela internally represents future optimal moves and that these representations are crucial for its final output in certain board states. Concretely, we exploit the fact that Leela is a transformer that treats every chessboard square like a token in language models, and give three lines of evidence (1) activations on certain squares of future moves are unusually important causally; (2) we find attention heads that move important information "forward and backward in time," e.g., from squares of future moves to squares of earlier ones; and (3) we train a simple probe that can predict the optimal move 2 turns ahead with 92% accuracy (in board states where Leela finds a single best line). These findings are an existence proof of learned look-ahead in neural networks and might be a step towards a better understanding of their capabilities.

* Project page: https://leela-interp.github.io/

Via

Access Paper or Ask Questions

Characterizing the Action-Generalization Gap in Deep Q-Learning

May 11, 2022

Zhiyuan Zhou, Cameron Allen, Kavosh Asadi, George Konidaris

Figure 1 for Characterizing the Action-Generalization Gap in Deep Q-Learning

Figure 2 for Characterizing the Action-Generalization Gap in Deep Q-Learning

Figure 3 for Characterizing the Action-Generalization Gap in Deep Q-Learning

Abstract:We study the action generalization ability of deep Q-learning in discrete action spaces. Generalization is crucial for efficient reinforcement learning (RL) because it allows agents to use knowledge learned from past experiences on new tasks. But while function approximation provides deep RL agents with a natural way to generalize over state inputs, the same generalization mechanism does not apply to discrete action outputs. And yet, surprisingly, our experiments indicate that Deep Q-Networks (DQN), which use exactly this type of function approximator, are still able to achieve modest action generalization. Our main contribution is twofold: first, we propose a method of evaluating action generalization using expert knowledge of action similarity, and empirically confirm that action generalization leads to faster learning; second, we characterize the action-generalization gap (the difference in learning performance between DQN and the expert) in different domains. We find that DQN can indeed generalize over actions in several simple domains, but that its ability to do so decreases as the action space grows larger.

* To appear at the 5th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM2022)

Via

Access Paper or Ask Questions

Coarse-Grained Smoothness for RL in Metric Spaces

Oct 23, 2021

Omer Gottesman, Kavosh Asadi, Cameron Allen, Sam Lobel, George Konidaris, Michael Littman

Figure 1 for Coarse-Grained Smoothness for RL in Metric Spaces

Figure 2 for Coarse-Grained Smoothness for RL in Metric Spaces

Figure 3 for Coarse-Grained Smoothness for RL in Metric Spaces

Figure 4 for Coarse-Grained Smoothness for RL in Metric Spaces

Abstract:Principled decision-making in continuous state--action spaces is impossible without some assumptions. A common approach is to assume Lipschitz continuity of the Q-function. We show that, unfortunately, this property fails to hold in many typical domains. We propose a new coarse-grained smoothness definition that generalizes the notion of Lipschitz continuity, is more widely applicable, and allows us to compute significantly tighter bounds on Q-functions, leading to improved learning. We provide a theoretical analysis of our new smoothness definition, and discuss its implications and impact on control and exploration in continuous domains.

Via

Access Paper or Ask Questions

Bad-Policy Density: A Measure of Reinforcement Learning Hardness

Oct 07, 2021

David Abel, Cameron Allen, Dilip Arumugam, D. Ellis Hershkowitz, Michael L. Littman, Lawson L. S. Wong

Figure 1 for Bad-Policy Density: A Measure of Reinforcement Learning Hardness

Figure 2 for Bad-Policy Density: A Measure of Reinforcement Learning Hardness

Figure 3 for Bad-Policy Density: A Measure of Reinforcement Learning Hardness

Figure 4 for Bad-Policy Density: A Measure of Reinforcement Learning Hardness

Abstract:Reinforcement learning is hard in general. Yet, in many specific environments, learning is easy. What makes learning easy in one environment, but difficult in another? We address this question by proposing a simple measure of reinforcement-learning hardness called the bad-policy density. This quantity measures the fraction of the deterministic stationary policy space that is below a desired threshold in value. We prove that this simple quantity has many properties one would expect of a measure of learning hardness. Further, we prove it is NP-hard to compute the measure in general, but there are paths to polynomial-time approximation. We conclude by summarizing potential directions and uses for this measure.

* Presented at the 2021 ICML Workshop on Reinforcement Learning Theory

Via

Access Paper or Ask Questions

Learning Markov State Abstractions for Deep Reinforcement Learning

Jun 08, 2021

Cameron Allen, Neev Parikh, Omer Gottesman, George Konidaris

Figure 1 for Learning Markov State Abstractions for Deep Reinforcement Learning

Figure 2 for Learning Markov State Abstractions for Deep Reinforcement Learning

Figure 3 for Learning Markov State Abstractions for Deep Reinforcement Learning

Figure 4 for Learning Markov State Abstractions for Deep Reinforcement Learning

Abstract:The fundamental assumption of reinforcement learning in Markov decision processes (MDPs) is that the relevant decision process is, in fact, Markov. However, when MDPs have rich observations, agents typically learn by way of an abstract state representation, and such representations are not guaranteed to preserve the Markov property. We introduce a novel set of conditions and prove that they are sufficient for learning a Markov abstract state representation. We then describe a practical training procedure that combines inverse model estimation and temporal contrastive learning to learn an abstraction that approximately satisfies these conditions. Our novel training objective is compatible with both online and offline training: it does not require a reward signal, but agents can capitalize on reward information when available. We empirically evaluate our approach on a visual gridworld domain and a set of continuous control benchmarks. Our approach learns representations that capture the underlying structure of the domain and lead to improved sample efficiency over state-of-the-art deep reinforcement learning with visual features -- often matching or exceeding the performance achieved with hand-designed compact state information.

* Code available at https://github.com/camall3n/markov-state-abstractions

Via

Access Paper or Ask Questions

Finding Macro-Actions with Disentangled Effects for Efficient Planning with the Goal-Count Heuristic

Apr 28, 2020

Cameron Allen, Tim Klinger, George Konidaris, Matthew Riemer, Gerald Tesauro

Figure 1 for Finding Macro-Actions with Disentangled Effects for Efficient Planning with the Goal-Count Heuristic

Figure 2 for Finding Macro-Actions with Disentangled Effects for Efficient Planning with the Goal-Count Heuristic

Figure 3 for Finding Macro-Actions with Disentangled Effects for Efficient Planning with the Goal-Count Heuristic

Figure 4 for Finding Macro-Actions with Disentangled Effects for Efficient Planning with the Goal-Count Heuristic

Abstract:The difficulty of classical planning increases exponentially with search-tree depth. Heuristic search can make planning more efficient, but good heuristics often require domain-specific assumptions and may not generalize to new problems. Rather than treating the planning problem as fixed and carefully designing a heuristic to match it, we instead construct macro-actions that support efficient planning with the simple and general-purpose "goal-count" heuristic. Our approach searches for macro-actions that modify only a small number of state variables (we call this measure "entanglement"). We show experimentally that reducing entanglement exponentially decreases planning time with the goal-count heuristic. Our method discovers macro-actions with disentangled effects that dramatically improve planning efficiency for 15-puzzle and Rubik's cube, reliably solving each domain without prior knowledge, and solving Rubik's cube with orders of magnitude less data than competing approaches.

* Code available at https://github.com/camall3n/skills-for-planning

Via

Access Paper or Ask Questions