Abstract:In this paper, we present an online reinforcement learning algorithm for constrained Markov decision processes with a safety constraint. Despite the necessary attention of the scientific community, considering stochastic stopping time, the problem of learning optimal policy without violating safety constraints during the learning phase is yet to be addressed. To this end, we propose an algorithm based on linear programming that does not require a process model. We show that the learned policy is safe with high confidence. We also propose a method to compute a safe baseline policy, which is central in developing algorithms that do not violate the safety constraints. Finally, we provide simulation results to show the efficacy of the proposed algorithm. Further, we demonstrate that efficient exploration can be achieved by defining a subset of the state-space called proxy set.
Abstract:We study N-player finite games with costs perturbed due to time-varying disturbances in the underlying system and to that end we propose the concept of Robust Correlated Equilibrium that generalizes the definition of Correlated Equilibrium. Conditions under which the Robust Correlated Equilibrium exists are specified and a decentralized algorithm for learning strategies that are optimal in the sense of Robust Correlated Equilibrium is proposed. The primary contribution of the paper is the convergence analysis of the algorithm and to that end, we propose an extension of the celebrated Blackwell's Approachability theorem to games with costs that are not just time-average as in the original Blackwell's Approachability Theorem but also include time-average of previous algorithm iterates. The designed algorithm is applied to a practical water distribution network with pumps being the controllers and their costs being perturbed by uncertain consumption by consumers. Simulation results show that each controller achieves no regret and empirical distributions converge to the Robust Correlated Equilibrium.