Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Leveraging human Domain Knowledge to model an empirical Reward function for a Reinforcement Learning problem

Sep 16, 2019

Dattaraj Rao

Figure 1 for Leveraging human Domain Knowledge to model an empirical Reward function for a Reinforcement Learning problem

Figure 2 for Leveraging human Domain Knowledge to model an empirical Reward function for a Reinforcement Learning problem

Figure 3 for Leveraging human Domain Knowledge to model an empirical Reward function for a Reinforcement Learning problem

Share this with someone who'll enjoy it:

Abstract:Traditional Reinforcement Learning (RL) problems depend on an exhaustive simulation environment that models real-world physics of the problem and trains the RL agent by observing this environment. In this paper, we present a novel approach to creating an environment by modeling the reward function based on empirical rules extracted from human domain knowledge of the system under study. Using this empirical rewards function, we will build an environment and train the agent. We will first create an environment that emulates the effect of setting cabin temperature through thermostat. This is typically done in RL problems by creating an exhaustive model of the system with detailed thermodynamic study. Instead, we propose an empirical approach to model the reward function based on human domain knowledge. We will document some rules of thumb that we usually exercise as humans while setting thermostat temperature and try and model these into our reward function. This modeling of empirical human domain rules into a reward function for RL is the unique aspect of this paper. This is a continuous action space problem and using deep deterministic policy gradient (DDPG) method, we will solve for maximizing the reward function. We will create a policy network that predicts optimal temperature setpoint given external temperature and humidity.

* 4 pages, 3 figures, code shared on Google colab

View paper on

Share this with someone who'll enjoy it:

Title:Leveraging human Domain Knowledge to model an empirical Reward function for a Reinforcement Learning problem

Paper and Code