Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aleksandar Milenovic

UCB-driven Utility Function Search for Multi-objective Reinforcement Learning

May 01, 2024

Yucheng Shi, Alexandros Agapitos, David Lynch, Giorgio Cruciata, Hao Wang, Yayu Yao, Aleksandar Milenovic

Figure 1 for UCB-driven Utility Function Search for Multi-objective Reinforcement Learning

Figure 2 for UCB-driven Utility Function Search for Multi-objective Reinforcement Learning

Figure 3 for UCB-driven Utility Function Search for Multi-objective Reinforcement Learning

Figure 4 for UCB-driven Utility Function Search for Multi-objective Reinforcement Learning

Abstract:In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours that trade-off between multiple, possibly conflicting, objectives. MORL based on decomposition is a family of solution methods that employ a number of utility functions to decompose the multi-objective problem into individual single-objective problems solved simultaneously in order to approximate a Pareto front of policies. We focus on the case of linear utility functions parameterised by weight vectors w. We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process, with the aim of maximising the hypervolume of the resulting Pareto front. The proposed method is shown to outperform various MORL baselines on Mujoco benchmark problems across different random seeds. The code is online at: https://github.com/SYCAMORE-1/ucb-MOPPO.

Via

Access Paper or Ask Questions

Offline Contextual Bandits for Wireless Network Optimization

Nov 11, 2021

Miguel Suau, Alexandros Agapitos, David Lynch, Derek Farrell, Mingqi Zhou, Aleksandar Milenovic

Figure 1 for Offline Contextual Bandits for Wireless Network Optimization

Figure 2 for Offline Contextual Bandits for Wireless Network Optimization

Figure 3 for Offline Contextual Bandits for Wireless Network Optimization

Abstract:The explosion in mobile data traffic together with the ever-increasing expectations for higher quality of service call for the development of AI algorithms for wireless network optimization. In this paper, we investigate how to learn policies that can automatically adjust the configuration parameters of every cell in the network in response to the changes in the user demand. Our solution combines existent methods for offline learning and adapts them in a principled way to overcome crucial challenges arising in this context. Empirical results suggest that our proposed method will achieve important performance gains when deployed in the real network while satisfying practical constrains on computational efficiency.

Via

Access Paper or Ask Questions