Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Personalisation via Dynamic Policy Fusion

Sep 30, 2024

Ajsal Shereef Palattuparambil, Thommen George Karimpanal, Santu Rana

Figure 1 for Personalisation via Dynamic Policy Fusion

Figure 2 for Personalisation via Dynamic Policy Fusion

Figure 3 for Personalisation via Dynamic Policy Fusion

Figure 4 for Personalisation via Dynamic Policy Fusion

Share this with someone who'll enjoy it:

Abstract:Deep reinforcement learning (RL) policies, although optimal in terms of task rewards, may not align with the personal preferences of human users. To ensure this alignment, a naive solution would be to retrain the agent using a reward function that encodes the user's specific preferences. However, such a reward function is typically not readily available, and as such, retraining the agent from scratch can be prohibitively expensive. We propose a more practical approach - to adapt the already trained policy to user-specific needs with the help of human feedback. To this end, we infer the user's intent through trajectory-level feedback and combine it with the trained task policy via a theoretically grounded dynamic policy fusion approach. As our approach collects human feedback on the very same trajectories used to learn the task policy, it does not require any additional interactions with the environment, making it a zero-shot approach. We empirically demonstrate in a number of environments that our proposed dynamic policy fusion approach consistently achieves the intended task while simultaneously adhering to user-specific needs.

View paper on

Share this with someone who'll enjoy it:

Title:Personalisation via Dynamic Policy Fusion

Paper and Code