Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Prabuchandran K. J

Reinforcement Learning in Non-Stationary Environments

May 10, 2019

Sindhu Padakandla, Prabuchandran K. J, Shalabh Bhatnagar

Figure 1 for Reinforcement Learning in Non-Stationary Environments

Figure 2 for Reinforcement Learning in Non-Stationary Environments

Figure 3 for Reinforcement Learning in Non-Stationary Environments

Figure 4 for Reinforcement Learning in Non-Stationary Environments

Abstract:Reinforcement learning (RL) methods learn optimal decisions in the presence of a stationary environment. However, the stationary assumption on the environment is very restrictive. In many real world problems like traffic signal control, robotic applications, one often encounters situations with non-stationary environments and in these scenarios, RL methods yield sub-optimal decisions. In this paper, we thus consider the problem of developing RL methods that obtain optimal decisions in a non-stationary environment. The goal of this problem is to maximize the long-term discounted reward achieved when the underlying model of the environment changes over time. To achieve this, we first adapt a change point algorithm to detect change in the statistics of the environment and then develop an RL algorithm that maximizes the long-run reward accrued. We illustrate that our change point method detects change in the model of the environment effectively and thus facilitates the RL algorithm in maximizing the long-run reward. We further validate the effectiveness of the proposed solution on non-stationary random Markov decision processes, a sensor energy management problem and a traffic signal control problem.

Via

Access Paper or Ask Questions

Energy Sharing for Multiple Sensor Nodes with Finite Buffers

Mar 17, 2015

Sindhu Padakandla, Prabuchandran K. J, Shalabh Bhatnagar

Figure 1 for Energy Sharing for Multiple Sensor Nodes with Finite Buffers

Figure 2 for Energy Sharing for Multiple Sensor Nodes with Finite Buffers

Figure 3 for Energy Sharing for Multiple Sensor Nodes with Finite Buffers

Figure 4 for Energy Sharing for Multiple Sensor Nodes with Finite Buffers

Abstract:We consider the problem of finding optimal energy sharing policies that maximize the network performance of a system comprising of multiple sensor nodes and a single energy harvesting (EH) source. Sensor nodes periodically sense the random field and generate data, which is stored in the corresponding data queues. The EH source harnesses energy from ambient energy sources and the generated energy is stored in an energy buffer. Sensor nodes receive energy for data transmission from the EH source. The EH source has to efficiently share the stored energy among the nodes in order to minimize the long-run average delay in data transmission. We formulate the problem of energy sharing between the nodes in the framework of average cost infinite-horizon Markov decision processes (MDPs). We develop efficient energy sharing algorithms, namely Q-learning algorithm with exploration mechanisms based on the $\epsilon$-greedy method as well as upper confidence bound (UCB). We extend these algorithms by incorporating state and action space aggregation to tackle state-action space explosion in the MDP. We also develop a cross entropy based method that incorporates policy parameterization in order to find near optimal energy sharing policies. Through simulations, we show that our algorithms yield energy sharing policies that outperform the heuristic greedy method.

* 38 pages, 10 figures

Via

Access Paper or Ask Questions