Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michael Littman

Knowledge Retention for Continual Model-Based Reinforcement Learning

Mar 06, 2025

Yixiang Sun, Haotian Fu, Michael Littman, George Konidaris

Abstract:We propose DRAGO, a novel approach for continual model-based reinforcement learning aimed at improving the incremental development of world models across a sequence of tasks that differ in their reward functions but not the state space or dynamics. DRAGO comprises two key components: Synthetic Experience Rehearsal, which leverages generative models to create synthetic experiences from past tasks, allowing the agent to reinforce previously learned dynamics without storing data, and Regaining Memories Through Exploration, which introduces an intrinsic reward mechanism to guide the agent toward revisiting relevant states from prior tasks. Together, these components enable the agent to maintain a comprehensive and continually developing world model, facilitating more effective learning and adaptation across diverse environments. Empirical evaluations demonstrate that DRAGO is able to preserve knowledge across tasks, achieving superior performance in various continual learning scenarios.

Via

Access Paper or Ask Questions

Computably Continuous Reinforcement-Learning Objectives are PAC-learnable

Mar 19, 2023

Cambridge Yang, Michael Littman, Michael Carbin

Abstract:In reinforcement learning, the classic objectives of maximizing discounted and finite-horizon cumulative rewards are PAC-learnable: There are algorithms that learn a near-optimal policy with high probability using a finite amount of samples and computation. In recent years, researchers have introduced objectives and corresponding reinforcement-learning algorithms beyond the classic cumulative rewards, such as objectives specified as linear temporal logic formulas. However, questions about the PAC-learnability of these new objectives have remained open. This work demonstrates the PAC-learnability of general reinforcement-learning objectives through sufficient conditions for PAC-learnability in two analysis settings. In particular, for the analysis that considers only sample complexity, we prove that if an objective given as an oracle is uniformly continuous, then it is PAC-learnable. Further, for the analysis that considers computational complexity, we prove that if an objective is computable, then it is PAC-learnable. In other words, if a procedure computes successive approximations of the objective's value, then the objective is PAC-learnable. We give three applications of our condition on objectives from the literature with previously unknown PAC-learnability and prove that these objectives are PAC-learnable. Overall, our result helps verify existing objectives' PAC-learnability. Also, as some studied objectives that are not uniformly continuous have been shown to be not PAC-learnable, our results could guide the design of new PAC-learnable objectives.

Via

Access Paper or Ask Questions

Meta-Learning Transferable Parameterized Skills

Jun 07, 2022

Haotian Fu, Shangqun Yu, Saket Tiwari, George Konidaris, Michael Littman

Figure 1 for Meta-Learning Transferable Parameterized Skills

Figure 2 for Meta-Learning Transferable Parameterized Skills

Figure 3 for Meta-Learning Transferable Parameterized Skills

Figure 4 for Meta-Learning Transferable Parameterized Skills

Abstract:We propose a novel parameterized skill-learning algorithm that aims to learn transferable parameterized skills and synthesize them into a new action space that supports efficient learning in long-horizon tasks. We first propose novel learning objectives -- trajectory-centric diversity and smoothness -- that allow an agent to meta-learn reusable parameterized skills. Our agent can use these learned skills to construct a temporally-extended parameterized-action Markov decision process, for which we propose a hierarchical actor-critic algorithm that aims to efficiently learn a high-level control policy with the learned skills. We empirically demonstrate that the proposed algorithms enable an agent to solve a complicated long-horizon obstacle-course environment.

Via

Access Paper or Ask Questions

Does DQN really learn? Exploring adversarial training schemes in Pong

Mar 20, 2022

Bowen He, Sreehari Rammohan, Jessica Forde, Michael Littman

Figure 1 for Does DQN really learn? Exploring adversarial training schemes in Pong

Figure 2 for Does DQN really learn? Exploring adversarial training schemes in Pong

Figure 3 for Does DQN really learn? Exploring adversarial training schemes in Pong

Abstract:In this work, we study two self-play training schemes, Chainer and Pool, and show they lead to improved agent performance in Atari Pong compared to a standard DQN agent -- trained against the built-in Atari opponent. To measure agent performance, we define a robustness metric that captures how difficult it is to learn a strategy that beats the agent's learned policy. Through playing past versions of themselves, Chainer and Pool are able to target weaknesses in their policies and improve their resistance to attack. Agents trained using these methods score well on our robustness metric and can easily defeat the standard DQN agent. We conclude by using linear probing to illuminate what internal structures the different agents develop to play the game. We show that training agents with Chainer or Pool leads to richer network activations with greater predictive power to estimate critical game-state features compared to the standard DQN agent.

* RLDM 2022

Via

Access Paper or Ask Questions

Learning Generalizable Behavior via Visual Rewrite Rules

Dec 09, 2021

Yiheng Xie, Mingxuan Li, Shangqun Yu, Michael Littman

Figure 1 for Learning Generalizable Behavior via Visual Rewrite Rules

Figure 2 for Learning Generalizable Behavior via Visual Rewrite Rules

Figure 3 for Learning Generalizable Behavior via Visual Rewrite Rules

Figure 4 for Learning Generalizable Behavior via Visual Rewrite Rules

Abstract:Though deep reinforcement learning agents have achieved unprecedented success in recent years, their learned policies can be brittle, failing to generalize to even slight modifications of their environments or unfamiliar situations. The black-box nature of the neural network learning dynamics makes it impossible to audit trained deep agents and recover from such failures. In this paper, we propose a novel representation and learning approach to capture environment dynamics without using neural networks. It originates from the observation that, in games designed for people, the effect of an action can often be perceived in the form of local changes in consecutive visual observations. Our algorithm is designed to extract such vision-based changes and condense them into a set of action-dependent descriptive rules, which we call ''visual rewrite rules'' (VRRs). We also present preliminary results from a VRR agent that can explore, expand its rule set, and solve a game via planning with its learned VRR world model. In several classical games, our non-deep agent demonstrates superior performance, extreme sample efficiency, and robust generalization ability compared with several mainstream deep agents.

* AAAI 2022 Workshop on Reinforcement Learning in Games

Via

Access Paper or Ask Questions

Reinforcement Learning for General LTL Objectives Is Intractable

Nov 24, 2021

Cambridge Yang, Michael Littman, Michael Carbin

Figure 1 for Reinforcement Learning for General LTL Objectives Is Intractable

Figure 2 for Reinforcement Learning for General LTL Objectives Is Intractable

Figure 3 for Reinforcement Learning for General LTL Objectives Is Intractable

Figure 4 for Reinforcement Learning for General LTL Objectives Is Intractable

Abstract:In recent years, researchers have made significant progress in devising reinforcement-learning algorithms for optimizing linear temporal logic (LTL) objectives and LTL-like objectives. Despite these advancements, there are fundamental limitations to how well this problem can be solved that previous studies have alluded to but, to our knowledge, have not examined in depth. In this paper, we address theoretically the hardness of learning with general LTL objectives. We formalize the problem under the probably approximately correct learning in Markov decision processes (PAC-MDP) framework, a standard framework for measuring sample complexity in reinforcement learning. In this formalization, we prove that the optimal policy for any LTL formula is PAC-MDP-learnable only if the formula is in the most limited class in the LTL hierarchy, consisting of only finite-horizon-decidable properties. Practically, our result implies that it is impossible for a reinforcement-learning algorithm to obtain a PAC-MDP guarantee on the performance of its learned policy after finitely many interactions with an unconstrained environment for non-finite-horizon-decidable LTL objectives.

Via

Access Paper or Ask Questions

Learning Finite Linear Temporal Logic Specifications with a Specialized Neural Operator

Nov 21, 2021

Homer Walke, Daniel Ritter, Carl Trimbach, Michael Littman

Figure 1 for Learning Finite Linear Temporal Logic Specifications with a Specialized Neural Operator

Figure 2 for Learning Finite Linear Temporal Logic Specifications with a Specialized Neural Operator

Figure 3 for Learning Finite Linear Temporal Logic Specifications with a Specialized Neural Operator

Figure 4 for Learning Finite Linear Temporal Logic Specifications with a Specialized Neural Operator

Abstract:Finite linear temporal logic ($\mathsf{LTL}_f$) is a powerful formal representation for modeling temporal sequences. We address the problem of learning a compact $\mathsf{LTL}_f$ formula from labeled traces of system behavior. We propose a novel neural network operator and evaluate the resulting architecture, Neural$\mathsf{LTL}_f$. Our approach includes a specialized recurrent filter, designed to subsume $\mathsf{LTL}_f$ temporal operators, to learn a highly accurate classifier for traces. Then, it discretizes the activations and extracts the truth table represented by the learned weights. This truth table is converted to symbolic form and returned as the learned formula. Experiments on randomly generated $\mathsf{LTL}_f$ formulas show Neural$\mathsf{LTL}_f$ scales to larger formula sizes than existing approaches and maintains high accuracy even in the presence of noise.

* 10 pages, 5 figures

Via

Access Paper or Ask Questions

Coarse-Grained Smoothness for RL in Metric Spaces

Oct 23, 2021

Omer Gottesman, Kavosh Asadi, Cameron Allen, Sam Lobel, George Konidaris, Michael Littman

Figure 1 for Coarse-Grained Smoothness for RL in Metric Spaces

Figure 2 for Coarse-Grained Smoothness for RL in Metric Spaces

Figure 3 for Coarse-Grained Smoothness for RL in Metric Spaces

Figure 4 for Coarse-Grained Smoothness for RL in Metric Spaces

Abstract:Principled decision-making in continuous state--action spaces is impossible without some assumptions. A common approach is to assume Lipschitz continuity of the Q-function. We show that, unfortunately, this property fails to hold in many typical domains. We propose a new coarse-grained smoothness definition that generalizes the notion of Lipschitz continuity, is more widely applicable, and allows us to compute significantly tighter bounds on Q-functions, leading to improved learning. We provide a theoretical analysis of our new smoothness definition, and discuss its implications and impact on control and exploration in continuous domains.

Via

Access Paper or Ask Questions

Model Selection's Disparate Impact in Real-World Deep Learning Applications

Apr 01, 2021

Jessica Zosa Forde, A. Feder Cooper, Kweku Kwegyir-Aggrey, Chris De Sa, Michael Littman

Figure 1 for Model Selection's Disparate Impact in Real-World Deep Learning Applications

Figure 2 for Model Selection's Disparate Impact in Real-World Deep Learning Applications

Figure 3 for Model Selection's Disparate Impact in Real-World Deep Learning Applications

Abstract:Algorithmic fairness has emphasized the role of biased data in automated decision outcomes. Recently, there has been a shift in attention to sources of bias that implicate fairness in other stages in the ML pipeline. We contend that one source of such bias, human preferences in model selection, remains under-explored in terms of its role in disparate impact across demographic groups. Using a deep learning model trained on real-world medical imaging data, we verify our claim empirically and argue that choice of metric for model comparison can significantly bias model selection outcomes.

* Accepted to the Science and Engineering of Deep Learning Workshop, ICLR 2021

Via

Access Paper or Ask Questions

Task Scoping: Building Goal-Specific Abstractions for Planning in Complex Domains

Oct 17, 2020

Nishanth Kumar, Michael Fishman, Natasha Danas, Michael Littman, Stefanie Tellex, George Konidaris

Figure 1 for Task Scoping: Building Goal-Specific Abstractions for Planning in Complex Domains

Figure 2 for Task Scoping: Building Goal-Specific Abstractions for Planning in Complex Domains

Figure 3 for Task Scoping: Building Goal-Specific Abstractions for Planning in Complex Domains

Abstract:A generally intelligent agent requires an open-scope world model: one rich enough to tackle any of the wide range of tasks it may be asked to solve over its operational lifetime. Unfortunately, planning to solve any specific task using such a rich model is computationally intractable - even for state-of-the-art methods - due to the many states and actions that are necessarily present in the model but irrelevant to that problem. We propose task scoping: a method that exploits knowledge of the initial condition, goal condition, and transition-dynamics structure of a task to automatically and efficiently prune provably irrelevant factors and actions from a planning problem, which can dramatically decrease planning time. We prove that task scoping never deletes relevant factors or actions, characterize its computational complexity, and characterize the planning problems for which it is especially useful. Finally, we empirically evaluate task scoping on a variety of domains and demonstrate that using it as a pre-planning step can reduce the state-action space of various planning problems by orders of magnitude and speed up planning. When applied to a complex Minecraft domain, our approach speeds up a state-of-the-art planner by 30 times, including the time required for task scoping itself.

Via

Access Paper or Ask Questions