Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tomáš Brázdil

Learning Algorithms for Verification of Markov Decision Processes

Mar 20, 2024

Tomáš Brázdil, Krishnendu Chatterjee, Martin Chmelik, Vojtěch Forejt, Jan Křetínský, Marta Kwiatkowska, Tobias Meggendorfer, David Parker, Mateusz Ujma

Figure 1 for Learning Algorithms for Verification of Markov Decision Processes

Figure 2 for Learning Algorithms for Verification of Markov Decision Processes

Figure 3 for Learning Algorithms for Verification of Markov Decision Processes

Figure 4 for Learning Algorithms for Verification of Markov Decision Processes

Abstract:We present a general framework for applying learning algorithms and heuristical guidance to the verification of Markov decision processes (MDPs). The primary goal of our techniques is to improve performance by avoiding an exhaustive exploration of the state space, instead focussing on particularly relevant areas of the system, guided by heuristics. Our work builds on the previous results of Br{\'{a}}zdil et al., significantly extending it as well as refining several details and fixing errors. The presented framework focuses on probabilistic reachability, which is a core problem in verification, and is instantiated in two distinct scenarios. The first assumes that full knowledge of the MDP is available, in particular precise transition probabilities. It performs a heuristic-driven partial exploration of the model, yielding precise lower and upper bounds on the required probability. The second tackles the case where we may only sample the MDP without knowing the exact transition dynamics. Here, we obtain probabilistic guarantees, again in terms of both the lower and upper bounds, which provides efficient stopping criteria for the approximation. In particular, the latter is an extension of statistical model-checking (SMC) for unbounded properties in MDPs. In contrast to other related approaches, we do not restrict our attention to time-bounded (finite-horizon) or discounted properties, nor assume any particular structural properties of the MDP.

Via

Access Paper or Ask Questions

Synthesizing Efficient Solutions for Patrolling Problems in the Internet Environment

May 10, 2018

Tomáš Brázdil, Antonín Kučera, Vojtěch Řehák

Figure 1 for Synthesizing Efficient Solutions for Patrolling Problems in the Internet Environment

Figure 2 for Synthesizing Efficient Solutions for Patrolling Problems in the Internet Environment

Abstract:We propose an algorithm for constructing efficient patrolling strategies in the Internet environment, where the protected targets are nodes connected to the network and the patrollers are software agents capable of detecting/preventing undesirable activities on the nodes. The algorithm is based on a novel compositional principle designed for a special class of strategies, and it can quickly construct (sub)optimal solutions even if the number of targets reaches hundreds of millions.

Via

Access Paper or Ask Questions

Stochastic Shortest Path with Energy Constraints in POMDPs

May 11, 2016

Tomáš Brázdil, Krishnendu Chatterjee, Martin Chmelík, Anchit Gupta, Petr Novotný

Figure 1 for Stochastic Shortest Path with Energy Constraints in POMDPs

Abstract:We consider partially observable Markov decision processes (POMDPs) with a set of target states and positive integer costs associated with every transition. The traditional optimization objective (stochastic shortest path) asks to minimize the expected total cost until the target set is reached. We extend the traditional framework of POMDPs to model energy consumption, which represents a hard constraint. The energy levels may increase and decrease with transitions, and the hard constraint requires that the energy level must remain positive in all steps till the target is reached. First, we present a novel algorithm for solving POMDPs with energy levels, developing on existing POMDP solvers and using RTDP as its main method. Our second contribution is related to policy representation. For larger POMDP instances the policies computed by existing solvers are too large to be understandable. We present an automated procedure based on machine learning techniques that automatically extracts important decisions of the policy allowing us to compute succinct human readable policies. Finally, we show experimentally that our algorithm performs well and computes succinct policies on a number of POMDP instances from the literature that were naturally enhanced with energy levels.

* Technical report accompanying a paper published in proceedings of AAMAS 2016

Via

Access Paper or Ask Questions

MultiGain: A controller synthesis tool for MDPs with multiple mean-payoff objectives

Jan 13, 2015

Tomáš Brázdil, Krishnendu Chatterjee, Vojtěch Forejt, Antonín Kučera

Figure 1 for MultiGain: A controller synthesis tool for MDPs with multiple mean-payoff objectives

Figure 2 for MultiGain: A controller synthesis tool for MDPs with multiple mean-payoff objectives

Figure 3 for MultiGain: A controller synthesis tool for MDPs with multiple mean-payoff objectives

Abstract:We present MultiGain, a tool to synthesize strategies for Markov decision processes (MDPs) with multiple mean-payoff objectives. Our models are described in PRISM, and our tool uses the existing interface and simulator of PRISM. Our tool extends PRISM by adding novel algorithms for multiple mean-payoff objectives, and also provides features such as (i)~generating strategies and exploring them for simulation, and checking them with respect to other properties; and (ii)~generating an approximate Pareto curve for two mean-payoff objectives. In addition, we present a new practical algorithm for the analysis of MDPs with multiple mean-payoff objectives under memoryless strategies.

* Extended version for a TACAS 2015 tool demo paper

Via

Access Paper or Ask Questions