Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics

Apr 20, 2022

Yannis Flet-Berliac, Debabrota Basu

Figure 1 for SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics

Figure 2 for SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics

Figure 3 for SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics

Figure 4 for SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics

Share this with someone who'll enjoy it:

Abstract:Although Reinforcement Learning (RL) is effective for sequential decision-making problems under uncertainty, it still fails to thrive in real-world systems where risk or safety is a binding constraint. In this paper, we formulate the RL problem with safety constraints as a non-zero-sum game. While deployed with maximum entropy RL, this formulation leads to a safe adversarially guided soft actor-critic framework, called SAAC. In SAAC, the adversary aims to break the safety constraint while the RL agent aims to maximize the constrained value function given the adversary's policy. The safety constraint on the agent's value function manifests only as a repulsion term between the agent's and the adversary's policies. Unlike previous approaches, SAAC can address different safety criteria such as safe exploration, mean-variance risk sensitivity, and CVaR-like coherent risk sensitivity. We illustrate the design of the adversary for these constraints. Then, in each of these variations, we show the agent differentiates itself from the adversary's unsafe actions in addition to learning to solve the task. Finally, for challenging continuous control tasks, we demonstrate that SAAC achieves faster convergence, better efficiency, and fewer failures to satisfy the safety constraints than risk-averse distributional RL and risk-neutral soft actor-critic algorithms.

View paper on

Share this with someone who'll enjoy it:

Title:SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics

Paper and Code