Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints

Jul 30, 2018

Yagiz Savas, Melkior Ornik, Murat Cubuktepe, Ufuk Topcu

Figure 1 for Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints

Figure 2 for Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints

Figure 3 for Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints

Figure 4 for Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints

Share this with someone who'll enjoy it:

Abstract:We study the problem of synthesizing a policy that maximizes the entropy of a Markov decision process (MDP) subject to a temporal logic constraint. Such a policy minimizes the predictability of the paths it generates, or dually, maximizes the continual exploration of different paths in an MDP while ensuring the satisfaction of a temporal logic specification. We first show that the maximum entropy of an MDP can be finite, infinite or unbounded. We provide necessary and sufficient conditions under which the maximum entropy of an MDP is finite, infinite or unbounded. We then present an algorithm to synthesize a policy that maximizes the entropy of an MDP. The proposed algorithm is based on a convex optimization problem and runs in time polynomial in the size of the MDP. We also show that maximizing the entropy of an MDP is equivalent to maximizing the entropy of the paths that reach a certain set of states in the MDP. Finally, we extend the algorithm to an MDP subject to a temporal logic specification. In numerical examples, we demonstrate the proposed method on different motion planning scenarios and illustrate that as the restrictions imposed on the paths by a specification increase, the maximum entropy decreases, which in turn, increases the predictability of paths.

View paper on

Share this with someone who'll enjoy it:

Title:Entropy Maximization for Markov Decision Processes Under Temporal Logic Constraints

Paper and Code