Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sebastian Junges

\textsc{rfPG}: Robust Finite-Memory Policy Gradients for Hidden-Model POMDPs

May 14, 2025

Maris F. L. Galesloot, Roman Andriushchenko, Milan Češka, Sebastian Junges, Nils Jansen

Abstract:Partially observable Markov decision processes (POMDPs) model specific environments in sequential decision-making under uncertainty. Critically, optimal policies for POMDPs may not be robust against perturbations in the environment. Hidden-model POMDPs (HM-POMDPs) capture sets of different environment models, that is, POMDPs with a shared action and observation space. The intuition is that the true model is hidden among a set of potential models, and it is unknown which model will be the environment at execution time. A policy is robust for a given HM-POMDP if it achieves sufficient performance for each of its POMDPs. We compute such robust policies by combining two orthogonal techniques: (1) a deductive formal verification technique that supports tractable robust policy evaluation by computing a worst-case POMDP within the HM-POMDP and (2) subgradient ascent to optimize the candidate policy for a worst-case POMDP. The empirical evaluation shows that, compared to various baselines, our approach (1) produces policies that are more robust and generalize better to unseen POMDPs and (2) scales to HM-POMDPs that consist of over a hundred thousand environments.

* Accepted for publication at IJCAI 2025

Via

Access Paper or Ask Questions

Tighter Value-Function Approximations for POMDPs

Feb 10, 2025

Merlijn Krale, Wietze Koops, Sebastian Junges, Thiago D. Simão, Nils Jansen

Abstract:Solving partially observable Markov decision processes (POMDPs) typically requires reasoning about the values of exponentially many state beliefs. Towards practical performance, state-of-the-art solvers use value bounds to guide this reasoning. However, sound upper value bounds are often computationally expensive to compute, and there is a tradeoff between the tightness of such bounds and their computational cost. This paper introduces new and provably tighter upper value bounds than the commonly used fast informed bound. Our empirical evaluation shows that, despite their additional computational overhead, the new upper bounds accelerate state-of-the-art POMDP solvers on a wide range of benchmarks.

* AAMAS 2025 submission

Via

Access Paper or Ask Questions

State Matching and Multiple References in Adaptive Active Automata Learning

Jun 28, 2024

Loes Kruger, Sebastian Junges, Jurriaan Rot

Abstract:Active automata learning (AAL) is a method to infer state machines by interacting with black-box systems. Adaptive AAL aims to reduce the sample complexity of AAL by incorporating domain specific knowledge in the form of (similar) reference models. Such reference models appear naturally when learning multiple versions or variants of a software system. In this paper, we present state matching, which allows flexible use of the structure of these reference models by the learner. State matching is the main ingredient of adaptive L#, a novel framework for adaptive learning, built on top of L#. Our empirical evaluation shows that adaptive L# improves the state of the art by up to two orders of magnitude.

* Extended paper for FM 2024

Via

Access Paper or Ask Questions

Learning-Based Verification of Stochastic Dynamical Systems with Neural Network Policies

Jun 02, 2024

Thom Badings, Wietze Koops, Sebastian Junges, Nils Jansen

Figure 1 for Learning-Based Verification of Stochastic Dynamical Systems with Neural Network Policies

Figure 2 for Learning-Based Verification of Stochastic Dynamical Systems with Neural Network Policies

Figure 3 for Learning-Based Verification of Stochastic Dynamical Systems with Neural Network Policies

Figure 4 for Learning-Based Verification of Stochastic Dynamical Systems with Neural Network Policies

Abstract:We consider the verification of neural network policies for reach-avoid control tasks in stochastic dynamical systems. We use a verification procedure that trains another neural network, which acts as a certificate proving that the policy satisfies the task. For reach-avoid tasks, it suffices to show that this certificate network is a reach-avoid supermartingale (RASM). As our main contribution, we significantly accelerate algorithmic approaches for verifying that a neural network is indeed a RASM. The main bottleneck of these approaches is the discretization of the state space of the dynamical system. The following two key contributions allow us to use a coarser discretization than existing approaches. First, we present a novel and fast method to compute tight upper bounds on Lipschitz constants of neural networks based on weighted norms. We further improve these bounds on Lipschitz constants based on the characteristics of the certificate network. Second, we integrate an efficient local refinement scheme that dynamically refines the state space discretization where necessary. Our empirical evaluation shows the effectiveness of our approach for verifying neural network policies in several benchmarks and trained with different reinforcement learning algorithms.

Via

Access Paper or Ask Questions

Approximate Dec-POMDP Solving Using Multi-Agent A*

May 09, 2024

Wietze Koops, Sebastian Junges, Nils Jansen

Abstract:We present an A*-based algorithm to compute policies for finite-horizon Dec-POMDPs. Our goal is to sacrifice optimality in favor of scalability for larger horizons. The main ingredients of our approach are (1) using clustered sliding window memory, (2) pruning the A* search tree, and (3) using novel A* heuristics. Our experiments show competitive performance to the state-of-the-art. Moreover, for multiple benchmarks, we achieve superior performance. In addition, we provide an A* algorithm that finds upper bounds for the optimum, tailored towards problems with long horizons. The main ingredient is a new heuristic that periodically reveals the state, thereby limiting the number of reachable beliefs. Our experiments demonstrate the efficacy and scalability of the approach.

* 19 pages, 3 figures. Extended version (with appendix) of the paper to appear in IJCAI 2024

Via

Access Paper or Ask Questions

Imprecise Probabilities Meet Partial Observability: Game Semantics for Robust POMDPs

May 08, 2024

Eline M. Bovy, Marnix Suilen, Sebastian Junges, Nils Jansen

Figure 1 for Imprecise Probabilities Meet Partial Observability: Game Semantics for Robust POMDPs

Figure 2 for Imprecise Probabilities Meet Partial Observability: Game Semantics for Robust POMDPs

Figure 3 for Imprecise Probabilities Meet Partial Observability: Game Semantics for Robust POMDPs

Figure 4 for Imprecise Probabilities Meet Partial Observability: Game Semantics for Robust POMDPs

Abstract:Partially observable Markov decision processes (POMDPs) rely on the key assumption that probability distributions are precisely known. Robust POMDPs (RPOMDPs) alleviate this concern by defining imprecise probabilities, referred to as uncertainty sets. While robust MDPs have been studied extensively, work on RPOMDPs is limited and primarily focuses on algorithmic solution methods. We expand the theoretical understanding of RPOMDPs by showing that 1) different assumptions on the uncertainty sets affect optimal policies and values; 2) RPOMDPs have a partially observable stochastic game (POSG) semantic; and 3) the same RPOMDP with different assumptions leads to semantically different POSGs and, thus, different policies and values. These novel semantics for RPOMDPS give access to results for the widely studied POSG model; concretely, we show the existence of a Nash equilibrium. Finally, we classify the existing RPOMDP literature using our semantics, clarifying under which uncertainty assumptions these existing works operate.

* Accepted at IJCAI 2024

Via

Access Paper or Ask Questions

Factored Online Planning in Many-Agent POMDPs

Dec 22, 2023

Maris F. L. Galesloot, Thiago D. Simão, Sebastian Junges, Nils Jansen

Figure 1 for Factored Online Planning in Many-Agent POMDPs

Figure 2 for Factored Online Planning in Many-Agent POMDPs

Figure 3 for Factored Online Planning in Many-Agent POMDPs

Figure 4 for Factored Online Planning in Many-Agent POMDPs

Abstract:In centralized multi-agent systems, often modeled as multi-agent partially observable Markov decision processes (MPOMDPs), the action and observation spaces grow exponentially with the number of agents, making the value and belief estimation of single-agent online planning ineffective. Prior work partially tackles value estimation by exploiting the inherent structure of multi-agent settings via so-called coordination graphs. Additionally, belief estimation has been improved by incorporating the likelihood of observations into the approximation. However, the challenges of value estimation and belief estimation have only been tackled individually, which prevents existing methods from scaling to many agents. Therefore, we address these challenges simultaneously. First, we introduce weighted particle filtering to a sample-based online planner for MPOMDPs. Second, we present a scalable approximation of the belief. Third, we bring an approach that exploits the typical locality of agent interactions to novel online planning algorithms for MPOMDPs operating on a so-called sparse particle filter tree. Our experimental evaluation against several state-of-the-art baselines shows that our methods (1) are competitive in settings with only a few agents and (2) improve over the baselines in the presence of many agents.

* Extended version (includes the Appendix) of the paper accepted at AAAI-24

Via

Access Paper or Ask Questions

Learning Formal Specifications from Membership and Preference Queries

Jul 19, 2023

Ameesh Shah, Marcell Vazquez-Chanlatte, Sebastian Junges, Sanjit A. Seshia

Figure 1 for Learning Formal Specifications from Membership and Preference Queries

Figure 2 for Learning Formal Specifications from Membership and Preference Queries

Figure 3 for Learning Formal Specifications from Membership and Preference Queries

Figure 4 for Learning Formal Specifications from Membership and Preference Queries

Abstract:Active learning is a well-studied approach to learning formal specifications, such as automata. In this work, we extend active specification learning by proposing a novel framework that strategically requests a combination of membership labels and pair-wise preferences, a popular alternative to membership labels. The combination of pair-wise preferences and membership labels allows for a more flexible approach to active specification learning, which previously relied on membership labels only. We instantiate our framework in two different domains, demonstrating the generality of our approach. Our results suggest that learning from both modalities allows us to robustly and conveniently identify specifications via membership and preferences.

* 6 pages, Presented at ICML 2023 Workshop on The Many Facets of Preference-Based Learning

Via

Access Paper or Ask Questions

Efficient Sensitivity Analysis for Parametric Robust Markov Chains

May 01, 2023

Thom Badings, Sebastian Junges, Ahmadreza Marandi, Ufuk Topcu, Nils Jansen

Abstract:We provide a novel method for sensitivity analysis of parametric robust Markov chains. These models incorporate parameters and sets of probability distributions to alleviate the often unrealistic assumption that precise probabilities are available. We measure sensitivity in terms of partial derivatives with respect to the uncertain transition probabilities regarding measures such as the expected reward. As our main contribution, we present an efficient method to compute these partial derivatives. To scale our approach to models with thousands of parameters, we present an extension of this method that selects the subset of $k$ parameters with the highest partial derivative. Our methods are based on linear programming and differentiating these programs around a given value for the parameters. The experiments show the applicability of our approach on models with over a million states and thousands of parameters. Moreover, we embed the results within an iterative learning scheme that profits from having access to a dedicated sensitivity analysis.

* To be presented at CAV 2023

Via

Access Paper or Ask Questions

COOL-MC: A Comprehensive Tool for Reinforcement Learning and Model Checking

Sep 15, 2022

Dennis Gross, Nils Jansen, Sebastian Junges, Guillermo A. Perez

Figure 1 for COOL-MC: A Comprehensive Tool for Reinforcement Learning and Model Checking

Abstract:This paper presents COOL-MC, a tool that integrates state-of-the-art reinforcement learning (RL) and model checking. Specifically, the tool builds upon the OpenAI gym and the probabilistic model checker Storm. COOL-MC provides the following features: (1) a simulator to train RL policies in the OpenAI gym for Markov decision processes (MDPs) that are defined as input for Storm, (2) a new model builder for Storm, which uses callback functions to verify (neural network) RL policies, (3) formal abstractions that relate models and policies specified in OpenAI gym or Storm, and (4) algorithms to obtain bounds on the performance of so-called permissive policies. We describe the components and architecture of COOL-MC and demonstrate its features on multiple benchmark environments.

Via

Access Paper or Ask Questions