Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kyle Hollins Wray

Alliance Innovation Lab Silicon Valley

NS-Gym: Open-Source Simulation Environments and Benchmarks for Non-Stationary Markov Decision Processes

Jan 16, 2025

Nathaniel S. Keplinger, Baiting Luo, Iliyas Bektas, Yunuo Zhang, Kyle Hollins Wray, Aron Laszka, Abhishek Dubey, Ayan Mukhopadhyay

Figure 1 for NS-Gym: Open-Source Simulation Environments and Benchmarks for Non-Stationary Markov Decision Processes

Figure 2 for NS-Gym: Open-Source Simulation Environments and Benchmarks for Non-Stationary Markov Decision Processes

Figure 3 for NS-Gym: Open-Source Simulation Environments and Benchmarks for Non-Stationary Markov Decision Processes

Figure 4 for NS-Gym: Open-Source Simulation Environments and Benchmarks for Non-Stationary Markov Decision Processes

Abstract:In many real-world applications, agents must make sequential decisions in environments where conditions are subject to change due to various exogenous factors. These non-stationary environments pose significant challenges to traditional decision-making models, which typically assume stationary dynamics. Non-stationary Markov decision processes (NS-MDPs) offer a framework to model and solve decision problems under such changing conditions. However, the lack of standardized benchmarks and simulation tools has hindered systematic evaluation and advance in this field. We present NS-Gym, the first simulation toolkit designed explicitly for NS-MDPs, integrated within the popular Gymnasium framework. In NS-Gym, we segregate the evolution of the environmental parameters that characterize non-stationarity from the agent's decision-making module, allowing for modular and flexible adaptations to dynamic environments. We review prior work in this domain and present a toolkit encapsulating key problem characteristics and types in NS-MDPs. This toolkit is the first effort to develop a set of standardized interfaces and benchmark problems to enable consistent and reproducible evaluation of algorithms under non-stationary conditions. We also benchmark six algorithmic approaches from prior work on NS-MDPs using NS-Gym. Our vision is that NS-Gym will enable researchers to assess the adaptability and robustness of their decision-making algorithms to non-stationary conditions.

* 23 pages, 17 figures

Via

Access Paper or Ask Questions

Multi-Objective Policy Gradients with Topological Constraints

Sep 15, 2022

Kyle Hollins Wray, Stas Tiomkin, Mykel J. Kochenderfer, Pieter Abbeel

Figure 1 for Multi-Objective Policy Gradients with Topological Constraints

Figure 2 for Multi-Objective Policy Gradients with Topological Constraints

Figure 3 for Multi-Objective Policy Gradients with Topological Constraints

Abstract:Multi-objective optimization models that encode ordered sequential constraints provide a solution to model various challenging problems including encoding preferences, modeling a curriculum, and enforcing measures of safety. A recently developed theory of topological Markov decision processes (TMDPs) captures this range of problems for the case of discrete states and actions. In this work, we extend TMDPs towards continuous spaces and unknown transition dynamics by formulating, proving, and implementing the policy gradient theorem for TMDPs. This theoretical result enables the creation of TMDP learning algorithms that use function approximators, and can generalize existing deep reinforcement learning (DRL) approaches. Specifically, we present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm. We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.

Via

Access Paper or Ask Questions

Competence-Aware Path Planning via Introspective Perception

Sep 28, 2021

Sadegh Rabiee, Connor Basich, Kyle Hollins Wray, Shlomo Zilberstein, Joydeep Biswas

Figure 1 for Competence-Aware Path Planning via Introspective Perception

Figure 2 for Competence-Aware Path Planning via Introspective Perception

Figure 3 for Competence-Aware Path Planning via Introspective Perception

Figure 4 for Competence-Aware Path Planning via Introspective Perception

Abstract:Robots deployed in the real world over extended periods of time need to reason about unexpected failures, learn to predict them, and to proactively take actions to avoid future failures. Existing approaches for competence-aware planning are either model-based, requiring explicit enumeration of known failure modes, or purely statistical, using state- and location-specific failure statistics to infer competence. We instead propose a structured model-free approach to competence-aware planning by reasoning about plan execution failures due to errors in perception, without requiring a-priori enumeration of failure modes or requiring location-specific failure statistics. We introduce competence-aware path planning via introspective perception (CPIP), a Bayesian framework to iteratively learn and exploit task-level competence in novel deployment environments. CPIP factorizes the competence-aware planning problem into two components. First, perception errors are learned in a model-free and location-agnostic setting via introspective perception prior to deployment in novel environments. Second, during actual deployments, the prediction of task-level failures is learned in a context-aware setting. Experiments in a simulation show that the proposed CPIP approach outperforms the frequentist baseline in multiple mobile robot tasks, and is further validated via real robot experiments in an environment with perceptually challenging obstacles and terrain.

* 8 pages, 8 figures

Via

Access Paper or Ask Questions

Improving Competence for Reliable Autonomy

Jul 23, 2020

Connor Basich, Justin Svegliato, Kyle Hollins Wray, Stefan J. Witwicki, Shlomo Zilberstein

Figure 1 for Improving Competence for Reliable Autonomy

Figure 2 for Improving Competence for Reliable Autonomy

Figure 3 for Improving Competence for Reliable Autonomy

Abstract:Given the complexity of real-world, unstructured domains, it is often impossible or impractical to design models that include every feature needed to handle all possible scenarios that an autonomous system may encounter. For an autonomous system to be reliable in such domains, it should have the ability to improve its competence online. In this paper, we propose a method for improving the competence of a system over the course of its deployment. We specifically focus on a class of semi-autonomous systems known as competence-aware systems that model their own competence -- the optimal extent of autonomy to use in any given situation -- and learn this competence over time from feedback received through interactions with a human authority. Our method exploits such feedback to identify important state features missing from the system's initial model, and incorporates them into its state representation. The result is an agent that better predicts human involvement, leading to improvements in its competence and reliability, and as a result, its overall performance.

* EPTCS 319, 2020, pp. 37-53
* In Proceedings AREA 2020, arXiv:2007.11260

Via

Access Paper or Ask Questions

Learning to Optimize Autonomy in Competence-Aware Systems

Mar 17, 2020

Connor Basich, Justin Svegliato, Kyle Hollins Wray, Stefan Witwicki, Joydeep Biswas, Shlomo Zilberstein

Figure 1 for Learning to Optimize Autonomy in Competence-Aware Systems

Figure 2 for Learning to Optimize Autonomy in Competence-Aware Systems

Figure 3 for Learning to Optimize Autonomy in Competence-Aware Systems

Figure 4 for Learning to Optimize Autonomy in Competence-Aware Systems

Abstract:Interest in semi-autonomous systems (SAS) is growing rapidly as a paradigm to deploy autonomous systems in domains that require occasional reliance on humans. This paradigm allows service robots or autonomous vehicles to operate at varying levels of autonomy and offer safety in situations that require human judgment. We propose an introspective model of autonomy that is learned and updated online through experience and dictates the extent to which the agent can act autonomously in any given situation. We define a competence-aware system (CAS) that explicitly models its own proficiency at different levels of autonomy and the available human feedback. A CAS learns to adjust its level of autonomy based on experience to maximize overall efficiency, factoring in the cost of human assistance. We analyze the convergence properties of CAS and provide experimental results for robot delivery and autonomous driving domains that demonstrate the benefits of the approach.

* To be published in Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2020). 9 pages

Via

Access Paper or Ask Questions

Planning in Stochastic Environments with Goal Uncertainty

Oct 18, 2018

Sandhya Saisubramanian, Kyle Hollins Wray, Luis Pineda, Shlomo Zilberstein

Figure 1 for Planning in Stochastic Environments with Goal Uncertainty

Figure 2 for Planning in Stochastic Environments with Goal Uncertainty

Figure 3 for Planning in Stochastic Environments with Goal Uncertainty

Abstract:We present the Goal Uncertain Stochastic Shortest Path (GUSSP) problem --- a general framework to model stochastic environments with goal uncertainty. The model is an extension of the stochastic shortest path (SSP) framework to dynamic environments in which it is impossible to determine the exact goal states ahead of plan execution. GUSSPs introduce flexibility in goal specification by allowing a belief over possible goal configurations. The partial observability is restricted to goals, facilitating the reduction to an SSP. We formally define a GUSSP and discuss its theoretical properties. We then propose an admissible heuristic that reduces the planning time of FLARES --- a start-of-the-art probabilistic planner. We also propose a determinization approach for solving this class of problems. Finally, we present empirical results using a mobile robot and three other problem domains.

* 8 pages, 5 figures

Via

Access Paper or Ask Questions