Abstract:Recent advancements in machine learning based energy management approaches, specifically reinforcement learning with a safety layer (OptLayerPolicy) and a metaheuristic algorithm generating a decision tree control policy (TreeC), have shown promise. However, their effectiveness has only been demonstrated in computer simulations. This paper presents the real-world validation of these methods, comparing against model predictive control and simple rule-based control benchmark. The experiments were conducted on the electrical installation of 4 reproductions of residential houses, which all have their own battery, photovoltaic and dynamic load system emulating a non-controllable electrical load and a controllable electric vehicle charger. The results show that the simple rules, TreeC, and model predictive control-based methods achieved similar costs, with a difference of only 0.6%. The reinforcement learning based method, still in its training phase, obtained a cost 25.5\% higher to the other methods. Additional simulations show that the costs can be further reduced by using a more representative training dataset for TreeC and addressing errors in the model predictive control implementation caused by its reliance on accurate data from various sources. The OptLayerPolicy safety layer allows safe online training of a reinforcement learning agent in the real-world, given an accurate constraint function formulation. The proposed safety layer method remains error-prone, nonetheless, it is found beneficial for all investigated methods. The TreeC method, which does require building a realistic simulation for training, exhibits the safest operational performance, exceeding the grid limit by only 27.1 Wh compared to 593.9 Wh for reinforcement learning.
Abstract:This research addresses the challenge of integrating forecasting and optimization in energy management systems, focusing on the impacts of switching costs, forecast accuracy, and stability. It proposes a novel framework for analyzing online optimization problems with switching costs and enabled by deterministic and probabilistic forecasts. Through empirical evaluation and theoretical analysis, the research reveals the balance between forecast accuracy, stability, and switching costs in shaping policy performance. Conducted in the context of battery scheduling within energy management applications, it introduces a metric for evaluating probabilistic forecast stability and examines the effects of forecast accuracy and stability on optimization outcomes using the real-world case of the Citylearn 2022 competition. Findings indicate that switching costs significantly influence the trade-off between forecast accuracy and stability, highlighting the importance of integrated systems that enable collaboration between forecasting and operational units for improved decision-making. The study shows that committing to a policy for longer periods can be advantageous over frequent updates. Results also show a correlation between forecast stability and policy performance, suggesting that stable forecasts can mitigate switching costs. The proposed framework provides valuable insights for energy sector decision-makers and forecast practitioners when designing the operation of an energy management system.
Abstract:Safe reinforcement learning (RL) with hard constraint guarantees is a promising optimal control direction for multi-energy management systems. It only requires the environment-specific constraint functions itself a prior and not a complete model (i.e. plant, disturbance and noise models, and prediction models for states not included in the plant model - e.g. demand, weather, and price forecasts). The project-specific upfront and ongoing engineering efforts are therefore still reduced, better representations of the underlying system dynamics can still be learned and modeling bias is kept to a minimum (no model-based objective function). However, even the constraint functions alone are not always trivial to accurately provide in advance (e.g. an energy balance constraint requires the detailed determination of all energy inputs and outputs), leading to potentially unsafe behavior. In this paper, we present two novel advancements: (I) combining the Optlayer and SafeFallback method, named OptLayerPolicy, to increase the initial utility while keeping a high sample efficiency. (II) introducing self-improving hard constraints, to increase the accuracy of the constraint functions as more data becomes available so that better policies can be learned. Both advancements keep the constraint formulation decoupled from the RL formulation, so that new (presumably better) RL algorithms can act as drop-in replacements. We have shown that, in a simulated multi-energy system case study, the initial utility is increased to 92.4% (OptLayerPolicy) compared to 86.1% (OptLayer) and that the policy after training is increased to 104.9% (GreyOptLayerPolicy) compared to 103.4% (OptLayer) - all relative to a vanilla RL benchmark. While introducing surrogate functions into the optimization problem requires special attention, we do conclude that the newly presented GreyOptLayerPolicy method is the most advantageous.
Abstract:Energy management systems (EMS) have classically been implemented based on rule-based control (RBC) and model predictive control (MPC) methods. Recent research are investigating reinforcement learning (RL) as a new promising approach. This paper introduces TreeC, a machine learning method that uses the metaheuristic algorithm covariance matrix adaptation evolution strategy (CMA-ES) to generate an interpretable EMS modeled as a decision tree. This method learns the decision strategy of the EMS based on historical data contrary to RBC and MPC approaches that are typically considered as non adaptive solutions. The decision strategy of the EMS is modeled as a decision tree and is thus interpretable contrary to RL which mainly uses black-box models (e.g. neural networks). The TreeC method is compared to RBC, MPC and RL strategies in two study cases taken from literature: (1) an electric grid case and (2) a household heating case. The results show that TreeC obtains close performances than MPC with perfect forecast in both cases and obtains similar performances to RL in the electrical grid case and outperforms RL in the household heating case. TreeC demonstrates a performant application of machine learning for energy management systems that is also fully interpretable.
Abstract:Reinforcement learning (RL) is a promising optimal control technique for multi-energy management systems. It does not require a model a priori - reducing the upfront and ongoing project-specific engineering effort and is capable of learning better representations of the underlying system dynamics. However, vanilla RL does not provide constraint satisfaction guarantees - resulting in various potentially unsafe interactions within its safety-critical environment. In this paper, we present two novel safe RL methods, namely SafeFallback and GiveSafe, where the safety constraint formulation is decoupled from the RL formulation and which provides hard-constraint satisfaction guarantees both during training a (near) optimal policy (which involves exploratory and exploitative, i.e. greedy, steps) as well as during deployment of any policy (e.g. random agents or offline trained RL agents). In a simulated multi-energy systems case study we have shown that both methods start with a significantly higher utility (i.e. useful policy) compared to a vanilla RL benchmark (94,6% and 82,8% compared to 35,5%) and that the proposed SafeFallback method even can outperform the vanilla RL benchmark (102,9% to 100%). We conclude that both methods are viably safety constraint handling techniques applicable beyond RL, as demonstrated with random policies while still providing hard-constraint guarantees. Finally, we propose directions for future work to i.a. improve the constraint functions itself as more data becomes available.
Abstract:This paper presents a solution to a predict then optimise problem which goal is to reduce the electricity cost of a university campus. The proposed methodology combines a multi-dimensional time series forecast and a novel approach to large-scale optimization. Gradient-boosting method is applied to forecast both generation and consumption time-series of the Monash university campus for the month of November 2020. For the consumption forecasts we employ log transformation to model trend and stabilize variance. Additional seasonality and trend features are added to the model inputs when applicable. The forecasts obtained are used as the base load for the schedule optimisation of university activities and battery usage. The goal of the optimisation is to minimize the electricity cost consisting of the price of electricity and the peak electricity tariff both altered by the load from class activities and battery use as well as the penalty of not scheduling some optional activities. The schedule of the class activities is obtained through evolutionary optimisation using the covariance matrix adaptation evolution strategy and the genetic algorithm. This schedule is then improved through local search by testing possible times for each activity one-by-one. The battery schedule is formulated as a mixed-integer programming problem and solved by the Gurobi solver. This method obtains the second lowest cost when evaluated against 6 other methods presented at an IEEE competition that all used mixed-integer programming and the Gurobi solver to schedule both the activities and the battery use.
Abstract:Model-predictive-control (MPC) offers an optimal control technique to establish and ensure that the total operation cost of multi-energy systems remains at a minimum while fulfilling all system constraints. However, this method presumes an adequate model of the underlying system dynamics, which is prone to modelling errors and is not necessarily adaptive. This has an associated initial and ongoing project-specific engineering cost. In this paper, we present an on- and off-policy multi-objective reinforcement learning (RL) approach, that does not assume a model a priori, benchmarking this against a linear MPC (LMPC - to reflect current practice, though non-linear MPC performs better) - both derived from the general optimal control problem, highlighting their differences and similarities. In a simple multi-energy system (MES) configuration case study, we show that a twin delayed deep deterministic policy gradient (TD3) RL agent offers potential to match and outperform the perfect foresight LMPC benchmark (101.5%). This while the realistic LMPC, i.e. imperfect predictions, only achieves 98%. While in a more complex MES system configuration, the RL agent's performance is generally lower (94.6%), yet still better than the realistic LMPC (88.9%). In both case studies, the RL agents outperformed the realistic LMPC after a training period of 2 years using quarterly interactions with the environment. We conclude that reinforcement learning is a viable optimal control technique for multi-energy systems given adequate constraint handling and pre-training, to avoid unsafe interactions and long training periods, as is proposed in fundamental future work.