Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Overcoming Reward Model Noise in Instruction-Guided Reinforcement Learning

Sep 24, 2024

Sukai Huang, Nir Lipovetzky, Trevor Cohn

Figure 1 for Overcoming Reward Model Noise in Instruction-Guided Reinforcement Learning

Figure 2 for Overcoming Reward Model Noise in Instruction-Guided Reinforcement Learning

Figure 3 for Overcoming Reward Model Noise in Instruction-Guided Reinforcement Learning

Figure 4 for Overcoming Reward Model Noise in Instruction-Guided Reinforcement Learning

Share this with someone who'll enjoy it:

Abstract:Vision-language models (VLMs) have gained traction as auxiliary reward models to provide more informative reward signals in sparse reward environments. However, our work reveals a critical vulnerability of this method: a small amount of noise in the reward signal can severely degrade agent performance. In challenging environments with sparse rewards, we show that reinforcement learning agents using VLM-based reward models without proper noise handling perform worse than agents relying solely on exploration-driven methods. We hypothesize that false positive rewards -- where the reward model incorrectly assigns rewards to trajectories that do not fulfill the given instruction -- are more detrimental to learning than false negatives. Our analysis confirms this hypothesis, revealing that the widely used cosine similarity metric, when applied to comparing agent trajectories and language instructions, is prone to generating false positive reward signals. To address this, we introduce BiMI (Binary Mutual Information), a novel noise-resilient reward function. Our experiments demonstrate that, BiMI significantly boosts the agent performance, with an average improvement ratio of 44.5\% across diverse environments with learned, non-oracle VLMs, thereby making VLM-based reward models practical for real-world applications.

* 9 main body pages, 7 appendix pages

View paper on

Share this with someone who'll enjoy it:

Title:Overcoming Reward Model Noise in Instruction-Guided Reinforcement Learning

Paper and Code