Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Value Gradient weighted Model-Based Reinforcement Learning

Apr 04, 2022

Claas Voelcker, Victor Liao, Animesh Garg, Amir-massoud Farahmand

Figure 1 for Value Gradient weighted Model-Based Reinforcement Learning

Figure 2 for Value Gradient weighted Model-Based Reinforcement Learning

Figure 3 for Value Gradient weighted Model-Based Reinforcement Learning

Figure 4 for Value Gradient weighted Model-Based Reinforcement Learning

Share this with someone who'll enjoy it:

Abstract:Model-based reinforcement learning (MBRL) is a sample efficient technique to obtain control policies, yet unavoidable modeling errors often lead performance deterioration. The model in MBRL is often solely fitted to reconstruct dynamics, state observations in particular, while the impact of model error on the policy is not captured by the training objective. This leads to a mismatch between the intended goal of MBRL, enabling good policy and value learning, and the target of the loss function employed in practice, future state prediction. Naive intuition would suggest that value-aware model learning would fix this problem and, indeed, several solutions to this objective mismatch problem have been proposed based on theoretical analysis. However, they tend to be inferior in practice to commonly used maximum likelihood (MLE) based approaches. In this paper we propose the Value-gradient weighted Model Learning (VaGraM), a novel method for value-aware model learning which improves the performance of MBRL in challenging settings, such as small model capacity and the presence of distracting state dimensions. We analyze both MLE and value-aware approaches and demonstrate how they fail to account for exploration and the behavior of function approximation when learning value-aware models and highlight the additional goals that must be met to stabilize optimization in the deep learning setting. We verify our analysis by showing that our loss function is able to achieve high returns on the Mujoco benchmark suite while being more robust than maximum likelihood based approaches.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Value Gradient weighted Model-Based Reinforcement Learning

Paper and Code