Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Maarten Messagie

Real-world validation of safe reinforcement learning, model predictive control and decision tree-based home energy management systems

Aug 14, 2024

Julian Ruddick, Glenn Ceusters, Gilles Van Kriekinge, Evgenii Genov, Thierry Coosemans, Maarten Messagie

Abstract:Recent advancements in machine learning based energy management approaches, specifically reinforcement learning with a safety layer (OptLayerPolicy) and a metaheuristic algorithm generating a decision tree control policy (TreeC), have shown promise. However, their effectiveness has only been demonstrated in computer simulations. This paper presents the real-world validation of these methods, comparing against model predictive control and simple rule-based control benchmark. The experiments were conducted on the electrical installation of 4 reproductions of residential houses, which all have their own battery, photovoltaic and dynamic load system emulating a non-controllable electrical load and a controllable electric vehicle charger. The results show that the simple rules, TreeC, and model predictive control-based methods achieved similar costs, with a difference of only 0.6%. The reinforcement learning based method, still in its training phase, obtained a cost 25.5\% higher to the other methods. Additional simulations show that the costs can be further reduced by using a more representative training dataset for TreeC and addressing errors in the model predictive control implementation caused by its reliance on accurate data from various sources. The OptLayerPolicy safety layer allows safe online training of a reinforcement learning agent in the real-world, given an accurate constraint function formulation. The proposed safety layer method remains error-prone, nonetheless, it is found beneficial for all investigated methods. The TreeC method, which does require building a realistic simulation for training, exhibits the safest operational performance, exceeding the grid limit by only 27.1 Wh compared to 593.9 Wh for reinforcement learning.

Via

Access Paper or Ask Questions

Predict. Optimize. Revise. On Forecast and Policy Stability in Energy Management Systems

Jun 29, 2024

Evgenii Genov, Julian Ruddick, Christoph Bergmeir, Majid Vafaeipour, Thierry Coosemans, Salvador Garcia, Maarten Messagie

Abstract:This research addresses the challenge of integrating forecasting and optimization in energy management systems, focusing on the impacts of switching costs, forecast accuracy, and stability. It proposes a novel framework for analyzing online optimization problems with switching costs and enabled by deterministic and probabilistic forecasts. Through empirical evaluation and theoretical analysis, the research reveals the balance between forecast accuracy, stability, and switching costs in shaping policy performance. Conducted in the context of battery scheduling within energy management applications, it introduces a metric for evaluating probabilistic forecast stability and examines the effects of forecast accuracy and stability on optimization outcomes using the real-world case of the Citylearn 2022 competition. Findings indicate that switching costs significantly influence the trade-off between forecast accuracy and stability, highlighting the importance of integrated systems that enable collaboration between forecasting and operational units for improved decision-making. The study shows that committing to a policy for longer periods can be advantageous over frequent updates. Results also show a correlation between forecast stability and policy performance, suggesting that stable forecasts can mitigate switching costs. The proposed framework provides valuable insights for energy sector decision-makers and forecast practitioners when designing the operation of an energy management system.

* 10 pages, contains an abstract section with a comment on KPIs, MPC formulation, and extra results on the in-sample performance

Via

Access Paper or Ask Questions

Safe reinforcement learning with self-improving hard constraints for multi-energy management systems

Apr 18, 2023

Glenn Ceusters, Muhammad Andy Putratama, Rüdiger Franke, Ann Nowé, Maarten Messagie

Figure 1 for Safe reinforcement learning with self-improving hard constraints for multi-energy management systems

Figure 2 for Safe reinforcement learning with self-improving hard constraints for multi-energy management systems

Figure 3 for Safe reinforcement learning with self-improving hard constraints for multi-energy management systems

Figure 4 for Safe reinforcement learning with self-improving hard constraints for multi-energy management systems

Abstract:Safe reinforcement learning (RL) with hard constraint guarantees is a promising optimal control direction for multi-energy management systems. It only requires the environment-specific constraint functions itself a prior and not a complete model (i.e. plant, disturbance and noise models, and prediction models for states not included in the plant model - e.g. demand, weather, and price forecasts). The project-specific upfront and ongoing engineering efforts are therefore still reduced, better representations of the underlying system dynamics can still be learned and modeling bias is kept to a minimum (no model-based objective function). However, even the constraint functions alone are not always trivial to accurately provide in advance (e.g. an energy balance constraint requires the detailed determination of all energy inputs and outputs), leading to potentially unsafe behavior. In this paper, we present two novel advancements: (I) combining the Optlayer and SafeFallback method, named OptLayerPolicy, to increase the initial utility while keeping a high sample efficiency. (II) introducing self-improving hard constraints, to increase the accuracy of the constraint functions as more data becomes available so that better policies can be learned. Both advancements keep the constraint formulation decoupled from the RL formulation, so that new (presumably better) RL algorithms can act as drop-in replacements. We have shown that, in a simulated multi-energy system case study, the initial utility is increased to 92.4% (OptLayerPolicy) compared to 86.1% (OptLayer) and that the policy after training is increased to 104.9% (GreyOptLayerPolicy) compared to 103.4% (OptLayer) - all relative to a vanilla RL benchmark. While introducing surrogate functions into the optimization problem requires special attention, we do conclude that the newly presented GreyOptLayerPolicy method is the most advantageous.

* 4579 words. arXiv admin note: text overlap with arXiv:2207.03830

Via

Access Paper or Ask Questions

TreeC: a method to generate interpretable energy management systems using a metaheuristic algorithm

Apr 17, 2023

Julian Ruddick, Luis Ramirez Camargo, Muhammad Andy Putratama, Maarten Messagie, Thierry Coosemans

Abstract:Energy management systems (EMS) have classically been implemented based on rule-based control (RBC) and model predictive control (MPC) methods. Recent research are investigating reinforcement learning (RL) as a new promising approach. This paper introduces TreeC, a machine learning method that uses the metaheuristic algorithm covariance matrix adaptation evolution strategy (CMA-ES) to generate an interpretable EMS modeled as a decision tree. This method learns the decision strategy of the EMS based on historical data contrary to RBC and MPC approaches that are typically considered as non adaptive solutions. The decision strategy of the EMS is modeled as a decision tree and is thus interpretable contrary to RL which mainly uses black-box models (e.g. neural networks). The TreeC method is compared to RBC, MPC and RL strategies in two study cases taken from literature: (1) an electric grid case and (2) a household heating case. The results show that TreeC obtains close performances than MPC with perfect forecast in both cases and obtains similar performances to RL in the electrical grid case and outperforms RL in the household heating case. TreeC demonstrates a performant application of machine learning for energy management systems that is also fully interpretable.

Via

Access Paper or Ask Questions

Safe reinforcement learning for multi-energy management systems with known constraint functions

Jul 19, 2022

Glenn Ceusters, Luis Ramirez Camargo, Rüdiger Franke, Ann Nowé, Maarten Messagie

Figure 1 for Safe reinforcement learning for multi-energy management systems with known constraint functions

Figure 2 for Safe reinforcement learning for multi-energy management systems with known constraint functions

Figure 3 for Safe reinforcement learning for multi-energy management systems with known constraint functions

Figure 4 for Safe reinforcement learning for multi-energy management systems with known constraint functions

Abstract:Reinforcement learning (RL) is a promising optimal control technique for multi-energy management systems. It does not require a model a priori - reducing the upfront and ongoing project-specific engineering effort and is capable of learning better representations of the underlying system dynamics. However, vanilla RL does not provide constraint satisfaction guarantees - resulting in various potentially unsafe interactions within its safety-critical environment. In this paper, we present two novel safe RL methods, namely SafeFallback and GiveSafe, where the safety constraint formulation is decoupled from the RL formulation and which provides hard-constraint satisfaction guarantees both during training a (near) optimal policy (which involves exploratory and exploitative, i.e. greedy, steps) as well as during deployment of any policy (e.g. random agents or offline trained RL agents). In a simulated multi-energy systems case study we have shown that both methods start with a significantly higher utility (i.e. useful policy) compared to a vanilla RL benchmark (94,6% and 82,8% compared to 35,5%) and that the proposed SafeFallback method even can outperform the vanilla RL benchmark (102,9% to 100%). We conclude that both methods are viably safety constraint handling techniques applicable beyond RL, as demonstrated with random policies while still providing hard-constraint guarantees. Finally, we propose directions for future work to i.a. improve the constraint functions itself as more data becomes available.

* 25 pages, 12 figures

Via

Access Paper or Ask Questions

Evolutionary scheduling of university activities based on consumption forecasts to minimise electricity costs

Feb 25, 2022

Julian Ruddick, Evgenii Genov, Luis Ramirez Camargo, Thierry Coosemans, Maarten Messagie

Figure 1 for Evolutionary scheduling of university activities based on consumption forecasts to minimise electricity costs

Figure 2 for Evolutionary scheduling of university activities based on consumption forecasts to minimise electricity costs

Figure 3 for Evolutionary scheduling of university activities based on consumption forecasts to minimise electricity costs

Figure 4 for Evolutionary scheduling of university activities based on consumption forecasts to minimise electricity costs

Abstract:This paper presents a solution to a predict then optimise problem which goal is to reduce the electricity cost of a university campus. The proposed methodology combines a multi-dimensional time series forecast and a novel approach to large-scale optimization. Gradient-boosting method is applied to forecast both generation and consumption time-series of the Monash university campus for the month of November 2020. For the consumption forecasts we employ log transformation to model trend and stabilize variance. Additional seasonality and trend features are added to the model inputs when applicable. The forecasts obtained are used as the base load for the schedule optimisation of university activities and battery usage. The goal of the optimisation is to minimize the electricity cost consisting of the price of electricity and the peak electricity tariff both altered by the load from class activities and battery use as well as the penalty of not scheduling some optional activities. The schedule of the class activities is obtained through evolutionary optimisation using the covariance matrix adaptation evolution strategy and the genetic algorithm. This schedule is then improved through local search by testing possible times for each activity one-by-one. The battery schedule is formulated as a mixed-integer programming problem and solved by the Gurobi solver. This method obtains the second lowest cost when evaluated against 6 other methods presented at an IEEE competition that all used mixed-integer programming and the Gurobi solver to schedule both the activities and the battery use.

* Submitted to the 2022 IEEE Congress on Evolutionary Computation

Via

Access Paper or Ask Questions

Model-predictive control and reinforcement learning in multi-energy system case studies

Apr 20, 2021

Glenn Ceusters, Román Cantú Rodríguez, Alberte Bouso García, Rüdiger Franke, Geert Deconinck, Lieve Helsen, Ann Nowé, Maarten Messagie, Luis Ramirez Camargo

Figure 1 for Model-predictive control and reinforcement learning in multi-energy system case studies

Figure 2 for Model-predictive control and reinforcement learning in multi-energy system case studies

Figure 3 for Model-predictive control and reinforcement learning in multi-energy system case studies

Figure 4 for Model-predictive control and reinforcement learning in multi-energy system case studies

Abstract:Model-predictive-control (MPC) offers an optimal control technique to establish and ensure that the total operation cost of multi-energy systems remains at a minimum while fulfilling all system constraints. However, this method presumes an adequate model of the underlying system dynamics, which is prone to modelling errors and is not necessarily adaptive. This has an associated initial and ongoing project-specific engineering cost. In this paper, we present an on- and off-policy multi-objective reinforcement learning (RL) approach, that does not assume a model a priori, benchmarking this against a linear MPC (LMPC - to reflect current practice, though non-linear MPC performs better) - both derived from the general optimal control problem, highlighting their differences and similarities. In a simple multi-energy system (MES) configuration case study, we show that a twin delayed deep deterministic policy gradient (TD3) RL agent offers potential to match and outperform the perfect foresight LMPC benchmark (101.5%). This while the realistic LMPC, i.e. imperfect predictions, only achieves 98%. While in a more complex MES system configuration, the RL agent's performance is generally lower (94.6%), yet still better than the realistic LMPC (88.9%). In both case studies, the RL agents outperformed the realistic LMPC after a training period of 2 years using quarterly interactions with the environment. We conclude that reinforcement learning is a viable optimal control technique for multi-energy systems given adequate constraint handling and pre-training, to avoid unsafe interactions and long training periods, as is proposed in fundamental future work.

* 35 pages, 27 figures

Via

Access Paper or Ask Questions