Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front

Oct 03, 2024

Ruohong Liu, Yuxin Pan, Linjie Xu, Lei Song, Pengcheng You, Yize Chen, Jiang Bian

Figure 1 for C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front

Figure 2 for C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front

Figure 3 for C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front

Figure 4 for C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front

Share this with someone who'll enjoy it:

Abstract:Multi-objective reinforcement learning (MORL) excels at handling rapidly changing preferences in tasks that involve multiple criteria, even for unseen preferences. However, previous dominating MORL methods typically generate a fixed policy set or preference-conditioned policy through multiple training iterations exclusively for sampled preference vectors, and cannot ensure the efficient discovery of the Pareto front. Furthermore, integrating preferences into the input of policy or value functions presents scalability challenges, in particular as the dimension of the state and preference space grow, which can complicate the learning process and hinder the algorithm's performance on more complex tasks. To address these issues, we propose a two-stage Pareto front discovery algorithm called Constrained MORL (C-MORL), which serves as a seamless bridge between constrained policy optimization and MORL. Concretely, a set of policies is trained in parallel in the initialization stage, with each optimized towards its individual preference over the multiple objectives. Then, to fill the remaining vacancies in the Pareto front, the constrained optimization steps are employed to maximize one objective while constraining the other objectives to exceed a predefined threshold. Empirically, compared to recent advancements in MORL methods, our algorithm achieves more consistent and superior performances in terms of hypervolume, expected utility, and sparsity on both discrete and continuous control tasks, especially with numerous objectives (up to nine objectives in our experiments).

* 27 pages, 8 figues. In Submission to a conference

View paper on

Share this with someone who'll enjoy it:

Title:C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front

Paper and Code