Abstract:Incorporating user preferences into multi-objective Bayesian optimization (MOBO) allows for personalization of the optimization procedure. Preferences are often abstracted in the form of an unknown utility function, estimated through pairwise comparisons of potential outcomes. However, utility-driven MOBO methods can yield solutions that are dominated by nearby solutions, as non-dominance is not enforced. Additionally, classical MOBO commonly relies on estimating the entire Pareto-front to identify the Pareto-optimal solutions, which can be expensive and ignore user preferences. Here, we present a new method, termed preference-utility-balanced MOBO (PUB-MOBO), that allows users to disambiguate between near-Pareto candidate solutions. PUB-MOBO combines utility-based MOBO with local multi-gradient descent to refine user-preferred solutions to be near-Pareto-optimal. To this end, we propose a novel preference-dominated utility function that concurrently preserves user-preferences and dominance amongst candidate solutions. A key advantage of PUB-MOBO is that the local search is restricted to a (small) region of the Pareto-front directed by user preferences, alleviating the need to estimate the entire Pareto-front. PUB-MOBO is tested on three synthetic benchmark problems: DTLZ1, DTLZ2 and DH1, as well as on three real-world problems: Vehicle Safety, Conceptual Marine Design, and Car Side Impact. PUB-MOBO consistently outperforms state-of-the-art competitors in terms of proximity to the Pareto-front and utility regret across all the problems.
Abstract:Deep neural networks are increasingly used as an effective way to represent control policies in a wide-range of learning-based control methods. For continuous-time optimal control problems (OCPs), which are central to many decision-making tasks, control policy learning can be cast as a neural ordinary differential equation (NODE) problem wherein state and control constraints are naturally accommodated. This paper presents a Lyapunov-NODE control (L-NODEC) approach to solving continuous-time OCPs for the case of stabilizing a known constrained nonlinear system around a terminal equilibrium point. We propose a Lyapunov loss formulation that incorporates a control-theoretic Lyapunov condition into the problem of learning a state-feedback neural control policy. We establish that L-NODEC ensures exponential stability of the controlled system, as well as its adversarial robustness to uncertain initial conditions. The performance of L-NODEC is illustrated on a benchmark double integrator problem and for optimal control of thermal dose delivery using a cold atmospheric plasma biomedical system. L-NODEC can substantially reduce the inference time necessary to reach the equilibrium state.