Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Contrastive Explanations for Comparing Preferences of Reinforcement Learning Agents

Dec 17, 2021

Jasmina Gajcin, Rahul Nair, Tejaswini Pedapati, Radu Marinescu, Elizabeth Daly, Ivana Dusparic

Figure 1 for Contrastive Explanations for Comparing Preferences of Reinforcement Learning Agents

Figure 2 for Contrastive Explanations for Comparing Preferences of Reinforcement Learning Agents

Figure 3 for Contrastive Explanations for Comparing Preferences of Reinforcement Learning Agents

Figure 4 for Contrastive Explanations for Comparing Preferences of Reinforcement Learning Agents

Share this with someone who'll enjoy it:

Abstract:In complex tasks where the reward function is not straightforward and consists of a set of objectives, multiple reinforcement learning (RL) policies that perform task adequately, but employ different strategies can be trained by adjusting the impact of individual objectives on reward function. Understanding the differences in strategies between policies is necessary to enable users to choose between offered policies, and can help developers understand different behaviors that emerge from various reward functions and training hyperparameters in RL systems. In this work we compare behavior of two policies trained on the same task, but with different preferences in objectives. We propose a method for distinguishing between differences in behavior that stem from different abilities from those that are a consequence of opposing preferences of two RL agents. Furthermore, we use only data on preference-based differences in order to generate contrasting explanations about agents' preferences. Finally, we test and evaluate our approach on an autonomous driving task and compare the behavior of a safety-oriented policy and one that prefers speed.

* 7 pages, 3 figures

View paper on

Share this with someone who'll enjoy it:

Title:Contrastive Explanations for Comparing Preferences of Reinforcement Learning Agents

Paper and Code