Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Trusted Approximate Policy Iteration with Bisimulation Metrics

Feb 06, 2022

Mete Kemertas, Allan Jepson

Figure 1 for Trusted Approximate Policy Iteration with Bisimulation Metrics

Figure 2 for Trusted Approximate Policy Iteration with Bisimulation Metrics

Figure 3 for Trusted Approximate Policy Iteration with Bisimulation Metrics

Share this with someone who'll enjoy it:

Abstract:Bisimulation metrics define a distance measure between states of a Markov decision process (MDP) based on a comparison of reward sequences. Due to this property they provide theoretical guarantees in value function approximation. In this work we first prove that bisimulation metrics can be defined via any $p$-Wasserstein metric for $p\geq 1$. Then we describe an approximate policy iteration (API) procedure that uses $\epsilon$-aggregation with $\pi$-bisimulation and prove performance bounds for continuous state spaces. We bound the difference between $\pi$-bisimulation metrics in terms of the change in the policies themselves. Based on these theoretical results, we design an API($\alpha$) procedure that employs conservative policy updates and enjoys better performance bounds than the naive API approach. In addition, we propose a novel trust region approach which circumvents the requirement to explicitly solve a constrained optimization problem. Finally, we provide experimental evidence of improved stability compared to non-conservative alternatives in simulated continuous control.

View paper on

Share this with someone who'll enjoy it:

Title:Trusted Approximate Policy Iteration with Bisimulation Metrics

Paper and Code