Abstract:We assume accurate observation of battery state of charge (SOC) and precise speed curves when addressing the constrained optimal fuel consumption (COFC) problem via constrained reinforcement learning (CRL). However, in practice, SOC measurements are often distorted by noise or confidentiality protocols, and actual reference speeds may deviate from expectations. We aim to minimize fuel consumption while maintaining SOC balance under observational perturbations in SOC and speed. This work first worldwide uses seven training approaches to solve the COFC problem under five types of perturbations, including one based on a uniform distribution, one designed to maximize rewards, one aimed at maximizing costs, and one along with its improved version that seeks to decrease reward on Toyota Hybrid Systems (THS) under New European Driving Cycle (NEDC) condition. The result verifies that the six can successfully solve the COFC problem under observational perturbations, and we further compare the robustness and safety of these training approaches and analyze their impact on optimal fuel consumption.
Abstract:Hybrid electric vehicles (HEVs) are becoming increasingly popular because they can better combine the working characteristics of internal combustion engines and electric motors. However, the minimum fuel consumption of an HEV for a battery electrical balance case under a specific assembly condition and a specific speed curve still needs to be clarified in academia and industry. Regarding this problem, this work provides the mathematical expression of constrained optimal fuel consumption (COFC) from the perspective of constrained reinforcement learning (CRL) for the first time globally. Also, two mainstream approaches of CRL, constrained variational policy optimization (CVPO) and Lagrangian-based approaches, are utilized for the first time to obtain the vehicle's minimum fuel consumption under the battery electrical balance condition. We conduct case studies on the well-known Prius TOYOTA hybrid system (THS) under the NEDC condition; we give vital steps to implement CRL approaches and compare the performance between the CVPO and Lagrangian-based approaches. Our case study found that CVPO and Lagrangian-based approaches can obtain the lowest fuel consumption while maintaining the SOC balance constraint. The CVPO approach converges stable, but the Lagrangian-based approach can obtain the lowest fuel consumption at 3.95 L/100km, though with more significant oscillations. This result verifies the effectiveness of our proposed CRL approaches to the COFC problem.