Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Martin Chmelik

Learning Algorithms for Verification of Markov Decision Processes

Mar 20, 2024

Tomáš Brázdil, Krishnendu Chatterjee, Martin Chmelik, Vojtěch Forejt, Jan Křetínský, Marta Kwiatkowska, Tobias Meggendorfer, David Parker, Mateusz Ujma

Figure 1 for Learning Algorithms for Verification of Markov Decision Processes

Figure 2 for Learning Algorithms for Verification of Markov Decision Processes

Figure 3 for Learning Algorithms for Verification of Markov Decision Processes

Figure 4 for Learning Algorithms for Verification of Markov Decision Processes

Abstract:We present a general framework for applying learning algorithms and heuristical guidance to the verification of Markov decision processes (MDPs). The primary goal of our techniques is to improve performance by avoiding an exhaustive exploration of the state space, instead focussing on particularly relevant areas of the system, guided by heuristics. Our work builds on the previous results of Br{\'{a}}zdil et al., significantly extending it as well as refining several details and fixing errors. The presented framework focuses on probabilistic reachability, which is a core problem in verification, and is instantiated in two distinct scenarios. The first assumes that full knowledge of the MDP is available, in particular precise transition probabilities. It performs a heuristic-driven partial exploration of the model, yielding precise lower and upper bounds on the required probability. The second tackles the case where we may only sample the MDP without knowing the exact transition dynamics. Here, we obtain probabilistic guarantees, again in terms of both the lower and upper bounds, which provides efficient stopping criteria for the approximation. In particular, the latter is an extension of statistical model-checking (SMC) for unbounded properties in MDPs. In contrast to other related approaches, we do not restrict our attention to time-bounded (finite-horizon) or discounted properties, nor assume any particular structural properties of the MDP.

Via

Access Paper or Ask Questions

Sensor Synthesis for POMDPs with Reachability Objectives

Sep 29, 2017

Krishnendu Chatterjee, Martin Chmelik, Ufuk Topcu

Figure 1 for Sensor Synthesis for POMDPs with Reachability Objectives

Figure 2 for Sensor Synthesis for POMDPs with Reachability Objectives

Figure 3 for Sensor Synthesis for POMDPs with Reachability Objectives

Figure 4 for Sensor Synthesis for POMDPs with Reachability Objectives

Abstract:Partially observable Markov decision processes (POMDPs) are widely used in probabilistic planning problems in which an agent interacts with an environment using noisy and imprecise sensors. We study a setting in which the sensors are only partially defined and the goal is to synthesize "weakest" additional sensors, such that in the resulting POMDP, there is a small-memory policy for the agent that almost-surely (with probability~1) satisfies a reachability objective. We show that the problem is NP-complete, and present a symbolic algorithm by encoding the problem into SAT instances. We illustrate trade-offs between the amount of memory of the policy and the number of additional sensors on a simple example. We have implemented our approach and consider three classical POMDP examples from the literature, and show that in all the examples the number of sensors can be significantly decreased (as compared to the existing solutions in the literature) without increasing the complexity of the policies.

* arXiv admin note: text overlap with arXiv:1511.08456

Via

Access Paper or Ask Questions

A Symbolic SAT-based Algorithm for Almost-sure Reachability with Small Strategies in POMDPs

Nov 26, 2015

Krishnendu Chatterjee, Martin Chmelik, Jessica Davies

Figure 1 for A Symbolic SAT-based Algorithm for Almost-sure Reachability with Small Strategies in POMDPs

Figure 2 for A Symbolic SAT-based Algorithm for Almost-sure Reachability with Small Strategies in POMDPs

Figure 3 for A Symbolic SAT-based Algorithm for Almost-sure Reachability with Small Strategies in POMDPs

Figure 4 for A Symbolic SAT-based Algorithm for Almost-sure Reachability with Small Strategies in POMDPs

Abstract:POMDPs are standard models for probabilistic planning problems, where an agent interacts with an uncertain environment. We study the problem of almost-sure reachability, where given a set of target states, the question is to decide whether there is a policy to ensure that the target set is reached with probability 1 (almost-surely). While in general the problem is EXPTIME-complete, in many practical cases policies with a small amount of memory suffice. Moreover, the existing solution to the problem is explicit, which first requires to construct explicitly an exponential reduction to a belief-support MDP. In this work, we first study the existence of observation-stationary strategies, which is NP-complete, and then small-memory strategies. We present a symbolic algorithm by an efficient encoding to SAT and using a SAT solver for the problem. We report experimental results demonstrating the scalability of our symbolic (SAT-based) approach.

* Full version of "A Symbolic SAT-based Algorithm for Almost-sure Reachability with Small Strategies in POMDPs" AAAI 2016

Via

Access Paper or Ask Questions

POMDPs under Probabilistic Semantics

Aug 09, 2014

Krishnendu Chatterjee, Martin Chmelik

Figure 1 for POMDPs under Probabilistic Semantics

Figure 2 for POMDPs under Probabilistic Semantics

Figure 3 for POMDPs under Probabilistic Semantics

Figure 4 for POMDPs under Probabilistic Semantics

Abstract:We consider partially observable Markov decision processes (POMDPs) with limit-average payoff, where a reward value in the interval [0,1] is associated to every transition, and the payoff of an infinite path is the long-run average of the rewards. We consider two types of path constraints: (i) quantitative constraint defines the set of paths where the payoff is at least a given threshold lambda_1 in (0,1]; and (ii) qualitative constraint which is a special case of quantitative constraint with lambda_1=1. We consider the computation of the almost-sure winning set, where the controller needs to ensure that the path constraint is satisfied with probability 1. Our main results for qualitative path constraint are as follows: (i) the problem of deciding the existence of a finite-memory controller is EXPTIME-complete; and (ii) the problem of deciding the existence of an infinite-memory controller is undecidable. For quantitative path constraint we show that the problem of deciding the existence of a finite-memory controller is undecidable.

* Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI2013)

Via

Access Paper or Ask Questions