Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DESTA: A Framework for Safe Reinforcement Learning with Markov Games of Intervention

Oct 27, 2021

David Mguni, Joel Jennings, Taher Jafferjee, Aivar Sootla, Yaodong Yang, Changmin Yu, Usman Islam, Ziyan Wang, Jun Wang

Figure 1 for DESTA: A Framework for Safe Reinforcement Learning with Markov Games of Intervention

Figure 2 for DESTA: A Framework for Safe Reinforcement Learning with Markov Games of Intervention

Figure 3 for DESTA: A Framework for Safe Reinforcement Learning with Markov Games of Intervention

Figure 4 for DESTA: A Framework for Safe Reinforcement Learning with Markov Games of Intervention

Share this with someone who'll enjoy it:

Abstract:Exploring in an unknown system can place an agent in dangerous situations, exposing to potentially catastrophic hazards. Many current approaches for tackling safe learning in reinforcement learning (RL) lead to a trade-off between safe exploration and fulfilling the task. Though these methods possibly incur fewer safety violations, they often also lead to reduced task performance. In this paper, we take the first step in introducing a generation of RL solvers that learn to minimise safety violations while maximising the task reward to the extend that can be tolerated by safe policies. Our approach uses a new two-player framework for safe RL called Distributive Exploration Safety Training Algorithm (DESTA). The core of DESTA is a novel game between two RL agents: SAFETY AGENT that is delegated the task of minimising safety violations and TASK AGENT whose goal is to maximise the reward set by the environment task. SAFETY AGENT can selectively take control of the system at any given point to prevent safety violations while TASK AGENT is free to execute its actions at all other states. This framework enables SAFETY AGENT to learn to take actions that minimise future safety violations (during and after training) by performing safe actions at certain states while TASK AGENT performs actions that maximise the task performance everywhere else. We demonstrate DESTA's ability to tackle challenging tasks and compare against state-of-the-art RL methods in Safety Gym Benchmarks which simulate real-world physical systems and OpenAI's Lunar Lander.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:DESTA: A Framework for Safe Reinforcement Learning with Markov Games of Intervention

Paper and Code