Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Thomas J. Walsh

Composing Efficient, Robust Tests for Policy Selection

Jun 12, 2023

Dustin Morrill, Thomas J. Walsh, Daniel Hernandez, Peter R. Wurman, Peter Stone

Abstract:Modern reinforcement learning systems produce many high-quality policies throughout the learning process. However, to choose which policy to actually deploy in the real world, they must be tested under an intractable number of environmental conditions. We introduce RPOSST, an algorithm to select a small set of test cases from a larger pool based on a relatively small number of sample evaluations. RPOSST treats the test case selection problem as a two-player game and optimizes a solution with provable $k$-of-$N$ robustness, bounding the error relative to a test that used all the test cases in the pool. Empirical results demonstrate that RPOSST finds a small set of test cases that identify high quality policies in a toy one-shot game, poker datasets, and a high-fidelity racing simulator.

* 26 pages, 13 figures. To appear in Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI 2023)

Via

Access Paper or Ask Questions

Batch-iFDD for Representation Expansion in Large MDPs

Sep 26, 2013

Alborz Geramifard, Thomas J. Walsh, Nicholas Roy, Jonathan How

Figure 1 for Batch-iFDD for Representation Expansion in Large MDPs

Figure 2 for Batch-iFDD for Representation Expansion in Large MDPs

Figure 3 for Batch-iFDD for Representation Expansion in Large MDPs

Figure 4 for Batch-iFDD for Representation Expansion in Large MDPs

Abstract:Matching pursuit (MP) methods are a promising class of feature construction algorithms for value function approximation. Yet existing MP methods require creating a pool of potential features, mandating expert knowledge or enumeration of a large feature pool, both of which hinder scalability. This paper introduces batch incremental feature dependency discovery (Batch-iFDD) as an MP method that inherits a provable convergence property. Additionally, Batch-iFDD does not require a large pool of features, leading to lower computational complexity. Empirical policy evaluation results across three domains with up to one million states highlight the scalability of Batch-iFDD over the previous state of the art MP algorithm.

* Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013)

Via

Access Paper or Ask Questions

Dynamic Teaching in Sequential Decision Making Environments

Oct 16, 2012

Thomas J. Walsh, Sergiu Goschin

Figure 1 for Dynamic Teaching in Sequential Decision Making Environments

Figure 2 for Dynamic Teaching in Sequential Decision Making Environments

Abstract:We describe theoretical bounds and a practical algorithm for teaching a model by demonstration in a sequential decision making environment. Unlike previous efforts that have optimized learners that watch a teacher demonstrate a static policy, we focus on the teacher as a decision maker who can dynamically choose different policies to teach different parts of the environment. We develop several teaching frameworks based on previously defined supervised protocols, such as Teaching Dimension, extending them to handle noise and sequences of inputs encountered in an MDP.We provide theoretical bounds on the learnability of several important model classes in this setting and suggest a practical algorithm for dynamic teaching.

* Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012)

Via

Access Paper or Ask Questions

Exploring compact reinforcement-learning representations with linear regression

May 09, 2012

Thomas J. Walsh, Istvan Szita, Carlos Diuk, Michael L. Littman

Figure 1 for Exploring compact reinforcement-learning representations with linear regression

Figure 2 for Exploring compact reinforcement-learning representations with linear regression

Figure 3 for Exploring compact reinforcement-learning representations with linear regression

Figure 4 for Exploring compact reinforcement-learning representations with linear regression

Abstract:This paper presents a new algorithm for online linear regression whose efficiency guarantees satisfy the requirements of the KWIK (Knows What It Knows) framework. The algorithm improves on the complexity bounds of the current state-of-the-art procedure in this setting. We explore several applications of this algorithm for learning compact reinforcement-learning representations. We show that KWIK linear regression can be used to learn the reward function of a factored MDP and the probabilities of action outcomes in Stochastic STRIPS and Object Oriented MDPs, none of which have been proven to be efficiently learnable in the RL setting before. We also combine KWIK linear regression with other KWIK learners to learn larger portions of these models, including experiments on learning factored MDP transition and reward functions together.

* Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009)

Via

Access Paper or Ask Questions