Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shibashis Guha

Reinforcement Learning for Omega-Regular Specifications on Continuous-Time MDP

Mar 16, 2023

Amin Falah, Shibashis Guha, Ashutosh Trivedi

Abstract:Continuous-time Markov decision processes (CTMDPs) are canonical models to express sequential decision-making under dense-time and stochastic environments. When the stochastic evolution of the environment is only available via sampling, model-free reinforcement learning (RL) is the algorithm-of-choice to compute optimal decision sequence. RL, on the other hand, requires the learning objective to be encoded as scalar reward signals. Since doing such translations manually is both tedious and error-prone, a number of techniques have been proposed to translate high-level objectives (expressed in logic or automata formalism) to scalar rewards for discrete-time Markov decision processes (MDPs). Unfortunately, no automatic translation exists for CTMDPs. We consider CTMDP environments against the learning objectives expressed as omega-regular languages. Omega-regular languages generalize regular languages to infinite-horizon specifications and can express properties given in popular linear-time logic LTL. To accommodate the dense-time nature of CTMDPs, we consider two different semantics of omega-regular objectives: 1) satisfaction semantics where the goal of the learner is to maximize the probability of spending positive time in the good states, and 2) expectation semantics where the goal of the learner is to optimize the long-run expected average time spent in the ``good states" of the automaton. We present an approach enabling correct translation to scalar reward signals that can be readily used by off-the-shelf RL algorithms for CTMDPs. We demonstrate the effectiveness of the proposed algorithms by evaluating it on some popular CTMDP benchmarks with omega-regular objectives.

* Full version of paper accepted to ICAPS 2023

Via

Access Paper or Ask Questions

PAC Statistical Model Checking of Mean Payoff in Discrete- and Continuous-Time MDP

Jun 03, 2022

Chaitanya Agarwal, Shibashis Guha, Jan Křetínský, M. Pazhamalai

Figure 1 for PAC Statistical Model Checking of Mean Payoff in Discrete- and Continuous-Time MDP

Figure 2 for PAC Statistical Model Checking of Mean Payoff in Discrete- and Continuous-Time MDP

Figure 3 for PAC Statistical Model Checking of Mean Payoff in Discrete- and Continuous-Time MDP

Figure 4 for PAC Statistical Model Checking of Mean Payoff in Discrete- and Continuous-Time MDP

Abstract:Markov decision processes (MDP) and continuous-time MDP (CTMDP) are the fundamental models for non-deterministic systems with probabilistic uncertainty. Mean payoff (a.k.a. long-run average reward) is one of the most classic objectives considered in their context. We provide the first algorithm to compute mean payoff probably approximately correctly in unknown MDP; further, we extend it to unknown CTMDP. We do not require any knowledge of the state space, only a lower bound on the minimum transition probability, which has been advocated in literature. In addition to providing probably approximately correct (PAC) bounds for our algorithm, we also demonstrate its practical nature by running experiments on standard benchmarks.

* Full version of CAV 2022 paper, 57 pages

Via

Access Paper or Ask Questions

Safe Learning for Near Optimal Scheduling

May 19, 2020

Gilles Geeraerts, Shibashis Guha, Guillermo A. Pérez, Jean-François Raskin

Figure 1 for Safe Learning for Near Optimal Scheduling

Figure 2 for Safe Learning for Near Optimal Scheduling

Figure 3 for Safe Learning for Near Optimal Scheduling

Figure 4 for Safe Learning for Near Optimal Scheduling

Abstract:In this paper, we investigate the combination of synthesis techniques and learning techniques to obtain safe and near optimal schedulers for a preemptible task scheduling problem. We study both model-based learning techniques with PAC guarantees and model-free learning techniques based on shielded deep Q-learning. The new learning algorithms have been implemented to conduct experimental evaluations.

Via

Access Paper or Ask Questions

Mixing Probabilistic and non-Probabilistic Objectives in Markov Decision Processes

Apr 28, 2020

Raphaël Berthon, Shibashis Guha, Jean-François Raskin

Figure 1 for Mixing Probabilistic and non-Probabilistic Objectives in Markov Decision Processes

Figure 2 for Mixing Probabilistic and non-Probabilistic Objectives in Markov Decision Processes

Figure 3 for Mixing Probabilistic and non-Probabilistic Objectives in Markov Decision Processes

Figure 4 for Mixing Probabilistic and non-Probabilistic Objectives in Markov Decision Processes

Abstract:In this paper, we consider algorithms to decide the existence of strategies in MDPs for Boolean combinations of objectives. These objectives are omega-regular properties that need to be enforced either surely, almost surely, existentially, or with non-zero probability. In this setting, relevant strategies are randomized infinite memory strategies: both infinite memory and randomization may be needed to play optimally. We provide algorithms to solve the general case of Boolean combinations and we also investigate relevant subcases. We further report on complexity bounds for these problems.

* Paper accepted to LICS 2020 - Full version

Via

Access Paper or Ask Questions