Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Devon Sigler

A Comparison of Model-Free and Model Predictive Control for Price Responsive Water Heaters

Nov 08, 2021

David J. Biagioni, Xiangyu Zhang, Peter Graf, Devon Sigler, Wesley Jones

Figure 1 for A Comparison of Model-Free and Model Predictive Control for Price Responsive Water Heaters

Figure 2 for A Comparison of Model-Free and Model Predictive Control for Price Responsive Water Heaters

Figure 3 for A Comparison of Model-Free and Model Predictive Control for Price Responsive Water Heaters

Figure 4 for A Comparison of Model-Free and Model Predictive Control for Price Responsive Water Heaters

Abstract:We present a careful comparison of two model-free control algorithms, Evolution Strategies (ES) and Proximal Policy Optimization (PPO), with receding horizon model predictive control (MPC) for operating simulated, price responsive water heaters. Four MPC variants are considered: a one-shot controller with perfect forecasting yielding optimal control; a limited-horizon controller with perfect forecasting; a mean forecasting-based controller; and a two-stage stochastic programming controller using historical scenarios. In all cases, the MPC model for water temperature and electricity price are exact; only water demand is uncertain. For comparison, both ES and PPO learn neural network-based policies by directly interacting with the simulated environment under the same scenarios used by MPC. All methods are then evaluated on a separate one-week continuation of the demand time series. We demonstrate that optimal control for this problem is challenging, requiring more than 8-hour lookahead for MPC with perfect forecasting to attain the minimum cost. Despite this challenge, both ES and PPO learn good general purpose policies that outperform mean forecast and two-stage stochastic MPC controllers in terms of average cost and are more than two orders of magnitude faster at computing actions. We show that ES in particular can leverage parallelism to learn a policy in under 90 seconds using 1150 CPU cores.

* In Proceedings of the 1st International Workshop on Reinforcement Learning for Energy Management in Buildings & Cities, pp. 29-33. 2020
* All authors are with the Computational Science Center at the National Renewable Energy Laboratory. Corresponding author: David Biagioni (dave.biagioni@nrel.gov)

Via

Access Paper or Ask Questions