Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Value function interference and greedy action selection in value-based multi-objective reinforcement learning

Feb 09, 2024

Peter Vamplew, Cameron Foale, Richard Dazeley

Figure 1 for Value function interference and greedy action selection in value-based multi-objective reinforcement learning

Figure 2 for Value function interference and greedy action selection in value-based multi-objective reinforcement learning

Figure 3 for Value function interference and greedy action selection in value-based multi-objective reinforcement learning

Figure 4 for Value function interference and greedy action selection in value-based multi-objective reinforcement learning

Share this with someone who'll enjoy it:

Abstract:Multi-objective reinforcement learning (MORL) algorithms extend conventional reinforcement learning (RL) to the more general case of problems with multiple, conflicting objectives, represented by vector-valued rewards. Widely-used scalar RL methods such as Q-learning can be modified to handle multiple objectives by (1) learning vector-valued value functions, and (2) performing action selection using a scalarisation or ordering operator which reflects the user's utility with respect to the different objectives. However, as we demonstrate here, if the user's utility function maps widely varying vector-values to similar levels of utility, this can lead to interference in the value-function learned by the agent, leading to convergence to sub-optimal policies. This will be most prevalent in stochastic environments when optimising for the Expected Scalarised Return criterion, but we present a simple example showing that interference can also arise in deterministic environments. We demonstrate empirically that avoiding the use of random tie-breaking when identifying greedy actions can ameliorate, but not fully overcome, the problems caused by value function interference.

View paper on

Share this with someone who'll enjoy it:

Title:Value function interference and greedy action selection in value-based multi-objective reinforcement learning

Paper and Code