Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Reconciling Rewards with Predictive State Representations

Jun 07, 2021

Andrea Baisero, Christopher Amato

Figure 1 for Reconciling Rewards with Predictive State Representations

Figure 2 for Reconciling Rewards with Predictive State Representations

Figure 3 for Reconciling Rewards with Predictive State Representations

Figure 4 for Reconciling Rewards with Predictive State Representations

Share this with someone who'll enjoy it:

Abstract:Predictive state representations (PSRs) are models of controlled non-Markov observation sequences which exhibit the same generative process governing POMDP observations without relying on an underlying latent state. In that respect, a PSR is indistinguishable from the corresponding POMDP. However, PSRs notoriously ignore the notion of rewards, which undermines the general utility of PSR models for control, planning, or reinforcement learning. Therefore, we describe a sufficient and necessary accuracy condition which determines whether a PSR is able to accurately model POMDP rewards, we show that rewards can be approximated even when the accuracy condition is not satisfied, and we find that a non-trivial number of POMDPs taken from a well-known third-party repository do not satisfy the accuracy condition. We propose reward-predictive state representations (R-PSRs), a generalization of PSRs which accurately models both observations and rewards, and develop value iteration for R-PSRs. We show that there is a mismatch between optimal POMDP policies and the optimal PSR policies derived from approximate rewards. On the other hand, optimal R-PSR policies perfectly match optimal POMDP policies, reconfirming R-PSRs as accurate state-less generative models of observations and rewards.

* IJCAI 2021

View paper on

Share this with someone who'll enjoy it:

Title:Reconciling Rewards with Predictive State Representations

Paper and Code