Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Patrick Wienhöft

Solving Robust Markov Decision Processes: Generic, Reliable, Efficient

Dec 13, 2024

Tobias Meggendorfer, Maximilian Weininger, Patrick Wienhöft

Abstract:Markov decision processes (MDP) are a well-established model for sequential decision-making in the presence of probabilities. In robust MDP (RMDP), every action is associated with an uncertainty set of probability distributions, modelling that transition probabilities are not known precisely. Based on the known theoretical connection to stochastic games, we provide a framework for solving RMDPs that is generic, reliable, and efficient. It is *generic* both with respect to the model, allowing for a wide range of uncertainty sets, including but not limited to intervals, $L^1$- or $L^2$-balls, and polytopes; and with respect to the objective, including long-run average reward, undiscounted total reward, and stochastic shortest path. It is *reliable*, as our approach not only converges in the limit, but provides precision guarantees at any time during the computation. It is *efficient* because -- in contrast to state-of-the-art approaches -- it avoids explicitly constructing the underlying stochastic game. Consequently, our prototype implementation outperforms existing tools by several orders of magnitude and can solve RMDPs with a million states in under a minute.

* Accepted for publication at AAAI'25. Extended version with full appendix, 26 pages

Via

Access Paper or Ask Questions

What Are the Odds? Improving the foundations of Statistical Model Checking

Apr 08, 2024

Tobias Meggendorfer, Maximilian Weininger, Patrick Wienhöft

Figure 1 for What Are the Odds? Improving the foundations of Statistical Model Checking

Figure 2 for What Are the Odds? Improving the foundations of Statistical Model Checking

Figure 3 for What Are the Odds? Improving the foundations of Statistical Model Checking

Figure 4 for What Are the Odds? Improving the foundations of Statistical Model Checking

Abstract:Markov decision processes (MDPs) are a fundamental model for decision making under uncertainty. They exhibit non-deterministic choice as well as probabilistic uncertainty. Traditionally, verification algorithms assume exact knowledge of the probabilities that govern the behaviour of an MDP. As this assumption is often unrealistic in practice, statistical model checking (SMC) was developed in the past two decades. It allows to analyse MDPs with unknown transition probabilities and provide probably approximately correct (PAC) guarantees on the result. Model-based SMC algorithms sample the MDP and build a model of it by estimating all transition probabilities, essentially for every transition answering the question: ``What are the odds?'' However, so far the statistical methods employed by the state of the art SMC algorithms are quite naive. Our contribution are several fundamental improvements to those methods: On the one hand, we survey statistics literature for better concentration inequalities; on the other hand, we propose specialised approaches that exploit our knowledge of the MDP. Our improvements are generally applicable to many kinds of problem statements because they are largely independent of the setting. Moreover, our experimental evaluation shows that they lead to significant gains, reducing the number of samples that the SMC algorithm has to collect by up to two orders of magnitude.

Via

Access Paper or Ask Questions

More for Less: Safe Policy Improvement With Stronger Performance Guarantees

May 13, 2023

Patrick Wienhöft, Marnix Suilen, Thiago D. Simão, Clemens Dubslaff, Christel Baier, Nils Jansen

Figure 1 for More for Less: Safe Policy Improvement With Stronger Performance Guarantees

Figure 2 for More for Less: Safe Policy Improvement With Stronger Performance Guarantees

Figure 3 for More for Less: Safe Policy Improvement With Stronger Performance Guarantees

Figure 4 for More for Less: Safe Policy Improvement With Stronger Performance Guarantees

Abstract:In an offline reinforcement learning setting, the safe policy improvement (SPI) problem aims to improve the performance of a behavior policy according to which sample data has been generated. State-of-the-art approaches to SPI require a high number of samples to provide practical probabilistic guarantees on the improved policy's performance. We present a novel approach to the SPI problem that provides the means to require less data for such guarantees. Specifically, to prove the correctness of these guarantees, we devise implicit transformations on the data set and the underlying environment model that serve as theoretical foundations to derive tighter improvement bounds for SPI. Our empirical evaluation, using the well-established SPI with baseline bootstrapping (SPIBB) algorithm, on standard benchmarks shows that our method indeed significantly reduces the sample complexity of the SPIBB algorithm.

* Accecpted at IJCAI 2023

Via

Access Paper or Ask Questions

Strategy Synthesis in Markov Decision Processes Under Limited Sampling Access

Mar 22, 2023

Christel Baier, Clemens Dubslaff, Patrick Wienhöft, Stefan J. Kiebel

Abstract:A central task in control theory, artificial intelligence, and formal methods is to synthesize reward-maximizing strategies for agents that operate in partially unknown environments. In environments modeled by gray-box Markov decision processes (MDPs), the impact of the agents' actions are known in terms of successor states but not the stochastics involved. In this paper, we devise a strategy synthesis algorithm for gray-box MDPs via reinforcement learning that utilizes interval MDPs as internal model. To compete with limited sampling access in reinforcement learning, we incorporate two novel concepts into our algorithm, focusing on rapid and successful learning rather than on stochastic guarantees and optimality: lower confidence bound exploration reinforces variants of already learned practical strategies and action scoping reduces the learning action space to promising actions. We illustrate benefits of our algorithms by means of a prototypical implementation applied on examples from the AI and formal methods communities.

* Accepted for publication at NASA Formal Methods (NFM) 2023. This is an extended version with the full appendix containing proofs, further pseudocode with explanations and additional experiment figures

Via

Access Paper or Ask Questions