Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Margaux Brégère

EDF R&D, LPSM

Online Episodic Convex Reinforcement Learning

May 12, 2025

Bianca Marin Moreno, Khaled Eldowa, Pierre Gaillard, Margaux Brégère, Nadia Oudjane

Abstract:We study online learning in episodic finite-horizon Markov decision processes (MDPs) with convex objective functions, known as the concave utility reinforcement learning (CURL) problem. This setting generalizes RL from linear to convex losses on the state-action distribution induced by the agent's policy. The non-linearity of CURL invalidates classical Bellman equations and requires new algorithmic approaches. We introduce the first algorithm achieving near-optimal regret bounds for online CURL without any prior knowledge on the transition function. To achieve this, we use an online mirror descent algorithm with varying constraint sets and a carefully designed exploration bonus. We then address for the first time a bandit version of CURL, where the only feedback is the value of the objective function on the state-action distribution induced by the agent's policy. We achieve a sub-linear regret bound for this more challenging problem by adapting techniques from bandit convex optimization to the MDP setting.

Via

Access Paper or Ask Questions

AutoML Algorithms for Online Generalized Additive Model Selection: Application to Electricity Demand Forecasting

Mar 31, 2025

Keshav Das, Julie Keisler, Margaux Brégère, Amaury Durand

Abstract:Electricity demand forecasting is key to ensuring that supply meets demand lest the grid would blackout. Reliable short-term forecasts may be obtained by combining a Generalized Additive Models (GAM) with a State-Space model (Obst et al., 2021), leading to an adaptive (or online) model. A GAM is an over-parameterized linear model defined by a formula and a state-space model involves hyperparameters. Both the formula and adaptation parameters have to be fixed before model training and have a huge impact on the model's predictive performance. We propose optimizing them using the DRAGON package of Keisler (2025), originally designed for neural architecture search. This work generalizes it for automated online generalized additive model selection by defining an efficient modeling of the search space (namely, the space of the GAM formulae and adaptation parameters). Its application to short-term French electricity demand forecasting demonstrates the relevance of the approach

* 13 pages, 1 figure

Via

Access Paper or Ask Questions

MetaCURL: Non-stationary Concave Utility Reinforcement Learning

May 30, 2024

Bianca Marin Moreno, Margaux Brégère, Pierre Gaillard, Nadia Oudjane

Figure 1 for MetaCURL: Non-stationary Concave Utility Reinforcement Learning

Abstract:We explore online learning in episodic loop-free Markov decision processes on non-stationary environments (changing losses and probability transitions). Our focus is on the Concave Utility Reinforcement Learning problem (CURL), an extension of classical RL for handling convex performance criteria in state-action distributions induced by agent policies. While various machine learning problems can be written as CURL, its non-linearity invalidates traditional Bellman equations. Despite recent solutions to classical CURL, none address non-stationary MDPs. This paper introduces MetaCURL, the first CURL algorithm for non-stationary MDPs. It employs a meta-algorithm running multiple black-box algorithms instances over different intervals, aggregating outputs via a sleeping expert framework. The key hurdle is partial information due to MDP uncertainty. Under partial information on the probability transitions (uncertainty and non-stationarity coming only from external noise, independent of agent state-action pairs), we achieve optimal dynamic regret without prior knowledge of MDP changes. Unlike approaches for RL, MetaCURL handles full adversarial losses, not just stochastic ones. We believe our approach for managing non-stationarity with experts can be of interest to the RL community.

Via

Access Paper or Ask Questions

Automated Deep Learning for Load Forecasting

May 14, 2024

Julie Keisler, Sandra Claudel, Gilles Cabriel, Margaux Brégère

Abstract:Accurate forecasting of electricity consumption is essential to ensure the performance and stability of the grid, especially as the use of renewable energy increases. Forecasting electricity is challenging because it depends on many external factors, such as weather and calendar variables. While regression-based models are currently effective, the emergence of new explanatory variables and the need to refine the temporality of the signals to be forecasted is encouraging the exploration of novel methodologies, in particular deep learning models. However, Deep Neural Networks (DNNs) struggle with this task due to the lack of data points and the different types of explanatory variables (e.g. integer, float, or categorical). In this paper, we explain why and how we used Automated Deep Learning (AutoDL) to find performing DNNs for load forecasting. We ended up creating an AutoDL framework called EnergyDragon by extending the DRAGON package and applying it to load forecasting. EnergyDragon automatically selects the features embedded in the DNN training in an innovative way and optimizes the architecture and the hyperparameters of the networks. We demonstrate on the French load signal that EnergyDragon can find original DNNs that outperform state-of-the-art load forecasting methods as well as other AutoDL approaches.

Via

Access Paper or Ask Questions

A Bandit Approach with Evolutionary Operators for Model Selection

Feb 07, 2024

Margaux Brégère, Julie Keisler

Abstract:This paper formulates model selection as an infinite-armed bandit problem. The models are arms, and picking an arm corresponds to a partial training of the model (resource allocation). The reward is the accuracy of the selected model after its partial training. In this best arm identification problem, regret is the gap between the expected accuracy of the optimal model and that of the model finally chosen. We first consider a straightforward generalization of UCB-E to the stochastic infinite-armed bandit problem and show that, under basic assumptions, the expected regret order is $T^{-\alpha}$ for some $\alpha \in (0,1/5)$ and $T$ the number of resources to allocate. From this vanilla algorithm, we introduce the algorithm Mutant-UCB that incorporates operators from evolutionary algorithms. Tests carried out on three open source image classification data sets attest to the relevance of this novel combining approach, which outperforms the state-of-the-art for a fixed budget.

Via

Access Paper or Ask Questions

Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent

Nov 30, 2023

Bianca Marin Moreno, Margaux Brégère, Pierre Gaillard, Nadia Oudjane

Figure 1 for Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent

Figure 2 for Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent

Figure 3 for Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent

Figure 4 for Efficient Model-Based Concave Utility Reinforcement Learning through Greedy Mirror Descent

Abstract:Many machine learning tasks can be solved by minimizing a convex function of an occupancy measure over the policies that generate them. These include reinforcement learning, imitation learning, among others. This more general paradigm is called the Concave Utility Reinforcement Learning problem (CURL). Since CURL invalidates classical Bellman equations, it requires new algorithms. We introduce MD-CURL, a new algorithm for CURL in a finite horizon Markov decision process. MD-CURL is inspired by mirror descent and uses a non-standard regularization to achieve convergence guarantees and a simple closed-form solution, eliminating the need for computationally expensive projection steps typically found in mirror descent approaches. We then extend CURL to an online learning scenario and present Greedy MD-CURL, a new method adapting MD-CURL to an online, episode-based setting with partially unknown dynamics. Like MD-CURL, the online version Greedy MD-CURL benefits from low computational complexity, while guaranteeing sub-linear or even logarithmic regret, depending on the level of information available on the underlying dynamics.

Via

Access Paper or Ask Questions

A mirror descent approach for Mean Field Control applied to Demande-Side management

Feb 16, 2023

Bianca Marin Moreno, Margaux Brégère, Pierre Gaillard, Nadia Oudjane

Abstract:We consider a finite-horizon Mean Field Control problem for Markovian models. The objective function is composed of a sum of convex and Lipschitz functions taking their values on a space of state-action distributions. We introduce an iterative algorithm which we prove to be a Mirror Descent associated with a non-standard Bregman divergence, having a convergence rate of order 1/ $\sqrt$ K. It requires the solution of a simple dynamic programming problem at each iteration. We compare this algorithm with learning methods for Mean Field Games after providing a reformulation of our control problem as a game problem. These theoretical contributions are illustrated with numerical examples applied to a demand-side management problem for power systems aimed at controlling the average power consumption profile of a population of flexible devices contributing to the power system balance.

Via

Access Paper or Ask Questions

Simulating Tariff Impact in Electrical Energy Consumption Profiles with Conditional Variational Autoencoders

Jun 10, 2020

Margaux Brégère, Ricardo J. Bessa

Figure 1 for Simulating Tariff Impact in Electrical Energy Consumption Profiles with Conditional Variational Autoencoders

Figure 2 for Simulating Tariff Impact in Electrical Energy Consumption Profiles with Conditional Variational Autoencoders

Figure 3 for Simulating Tariff Impact in Electrical Energy Consumption Profiles with Conditional Variational Autoencoders

Figure 4 for Simulating Tariff Impact in Electrical Energy Consumption Profiles with Conditional Variational Autoencoders

Abstract:The implementation of efficient demand response (DR) programs for household electricity consumption would benefit from data-driven methods capable of simulating the impact of different tariffs schemes. This paper proposes a novel method based on conditional variational autoencoders (CVAE) to generate, from an electricity tariff profile combined with exogenous weather and calendar variables, daily consumption profiles of consumers segmented in different clusters. First, a large set of consumers is gathered into clusters according to their consumption behavior and price-responsiveness. The clustering method is based on a causality model that measures the effect of a specific tariff on the consumption level. Then, daily electrical energy consumption profiles are generated for each cluster with CVAE. This non-parametric approach is compared to a semi-parametric data generator based on generalized additive models and that uses prior knowledge of energy consumption. Experiments in a publicly available data set show that, the proposed method presents comparable performance to the semi-parametric one when it comes to generating the average value of the original data. The main contribution from this new method is the capacity to reproduce rebound and side effects in the generated consumption profiles. Indeed, the application of a special electricity tariff over a time window may also affect consumption outside this time window. Another contribution is that the clustering approach segments consumers according to their daily consumption profile and elasticity to tariff changes. These two results combined are very relevant for an ex-ante testing of future DR policies by system operators, retailers and energy regulators.

* 27 pages, 8 figures

Via

Access Paper or Ask Questions

Online Hierarchical Forecasting for Power Consumption Data

Mar 01, 2020

Margaux Brégère, Malo Huard

Figure 1 for Online Hierarchical Forecasting for Power Consumption Data

Figure 2 for Online Hierarchical Forecasting for Power Consumption Data

Figure 3 for Online Hierarchical Forecasting for Power Consumption Data

Figure 4 for Online Hierarchical Forecasting for Power Consumption Data

Abstract:We study the forecasting of the power consumptions of a population of households and of subpopulations thereof. These subpopulations are built according to location, to exogenous information and/or to profiles we determined from historical households consumption time series. Thus, we aim to forecast the electricity consumption time series at several levels of households aggregation. These time series are linked through some summation constraints which induce a hierarchy. Our approach consists in three steps: feature generation, aggregation and projection. Firstly (feature generation step), we build, for each considering group for households, a benchmark forecast (called features), using random forests or generalized additive models. Secondly (aggregation step), aggregation algorithms, run in parallel, aggregate these forecasts and provide new predictions. Finally (projection step), we use the summation constraints induced by the time series underlying hierarchy to re-conciliate the forecasts by projecting them in a well-chosen linear subspace. We provide some theoretical guaranties on the average prediction error of this methodology, through the minimization of a quantity called regret. We also test our approach on households power consumption data collected in Great Britain by multiple energy providers in the Energy Demand Research Project context. We build and compare various population segmentations for the evaluation of our approach performance.

Via

Access Paper or Ask Questions

Target Tracking for Contextual Bandits: Application to Demand Side Management

Jan 28, 2019

Margaux Brégère, Pierre Gaillard, Yannig Goude, Gilles Stoltz

Figure 1 for Target Tracking for Contextual Bandits: Application to Demand Side Management

Figure 2 for Target Tracking for Contextual Bandits: Application to Demand Side Management

Abstract:We propose a contextual-bandit approach for demand side management by offering price incentives. More precisely, a target mean consumption is set at each round and the mean consumption is modeled as a complex function of the distribution of prices sent and of some contextual variables such as the temperature, weather, and so on. The performance of our strategies is measured in quadratic losses through a regret criterion. We offer $\sqrt{T}$ upper bounds on this regret (up to poly-logarithmic terms), for strategies inspired by standard strategies for contextual bandits (like LinUCB, Li et al., 2010). Simulations on a real data set gathered by UK Power Networks, in which price incentives were offered, show that our strategies are effective and may indeed manage demand response by suitably picking the price levels.

Via

Access Paper or Ask Questions