Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Setareh Maghsudi

Stochastic Multi-Objective Multi-Armed Bandits: Regret Definition and Algorithm

Jun 16, 2025

Mansoor Davoodi, Setareh Maghsudi

Abstract:Multi-armed bandit (MAB) problems are widely applied to online optimization tasks that require balancing exploration and exploitation. In practical scenarios, these tasks often involve multiple conflicting objectives, giving rise to multi-objective multi-armed bandits (MO-MAB). Existing MO-MAB approaches predominantly rely on the Pareto regret metric introduced in \cite{drugan2013designing}. However, this metric has notable limitations, particularly in accounting for all Pareto-optimal arms simultaneously. To address these challenges, we propose a novel and comprehensive regret metric that ensures balanced performance across conflicting objectives. Additionally, we introduce the concept of \textit{Efficient Pareto-Optimal} arms, which are specifically designed for online optimization. Based on our new metric, we develop a two-phase MO-MAB algorithm that achieves sublinear regret for both Pareto-optimal and efficient Pareto-optimal arms.

Via

Access Paper or Ask Questions

Quantum-Inspired Reinforcement Learning in the Presence of Epistemic Ambivalence

Mar 06, 2025

Alireza Habibi, Saeed Ghoorchian, Setareh Maghsudi

Abstract:The complexity of online decision-making under uncertainty stems from the requirement of finding a balance between exploiting known strategies and exploring new possibilities. Naturally, the uncertainty type plays a crucial role in developing decision-making strategies that manage complexity effectively. In this paper, we focus on a specific form of uncertainty known as epistemic ambivalence (EA), which emerges from conflicting pieces of evidence or contradictory experiences. It creates a delicate interplay between uncertainty and confidence, distinguishing it from epistemic uncertainty that typically diminishes with new information. Indeed, ambivalence can persist even after additional knowledge is acquired. To address this phenomenon, we propose a novel framework, called the epistemically ambivalent Markov decision process (EA-MDP), aiming to understand and control EA in decision-making processes. This framework incorporates the concept of a quantum state from the quantum mechanics formalism, and its core is to assess the probability and reward of every possible outcome. We calculate the reward function using quantum measurement techniques and prove the existence of an optimal policy and an optimal value function in the EA-MDP framework. We also propose the EA-epsilon-greedy Q-learning algorithm. To evaluate the impact of EA on decision-making and the expedience of our framework, we study two distinct experimental setups, namely the two-state problem and the lattice problem. Our results show that using our methods, the agent converges to the optimal policy in the presence of EA.

Via

Access Paper or Ask Questions

Optimal User and Target Scheduling, User-Target Pairing, and Low-Resolution Phase-Only Beamforming for ISAC Systems

Jan 20, 2025

Luis F. Abanto-Leon, Setareh Maghsudi

Abstract:We investigate the joint user and target scheduling, user-target pairing, and low-resolution phase-only beamforming design for integrated sensing and communications (ISAC). Scheduling determines which users and targets are served, while pairing specifies which users and targets are grouped into pairs. Additionally, the beamformers are designed using few-bit constant-modulus phase shifts. This resource allocation problem is a nonconvex mixed-integer nonlinear program (MINLP) and challenging to solve. To address it, we propose an exact mixed-integer linear program (MILP) reformulation, which leads to a globally optimal solution. Our results demonstrate the superiority of an optimal joint design compared to heuristic stage-wise approaches, which are highly sensitive to scenario characteristics.

* IEEE Transactions on Vehicular Technology

Via

Access Paper or Ask Questions

Hierarchical Functionality Prioritization in Multicast ISAC: Optimal Admission Control and Discrete-Phase Beamforming

Dec 31, 2024

Luis F. Abanto-Leon, Setareh Maghsudi

Abstract:We investigate the joint admission control and discrete-phase multicast beamforming design for integrated sensing and communications (ISAC) systems, where sensing and communications functionalities have different hierarchies. Specifically, the ISAC system first allocates resources to the higher-hierarchy functionality and opportunistically uses the remaining resources to support the lower-hierarchy one. This resource allocation problem is a nonconvex mixed-integer nonlinear program (MINLP). We propose an exact mixed-integer linear program (MILP) reformulation, leading to a globally optimal solution. In addition, we implemented three baselines for comparison, which our proposed method outperforms by more than 39%.

* IEEE Communications Letters, 2024
* 5 pages

Via

Access Paper or Ask Questions

From Code to Play: Benchmarking Program Search for Games Using Large Language Models

Dec 05, 2024

Manuel Eberhardinger, James Goodman, Alexander Dockhorn, Diego Perez-Liebana, Raluca D. Gaina, Duygu Çakmak, Setareh Maghsudi, Simon Lucas

Figure 1 for From Code to Play: Benchmarking Program Search for Games Using Large Language Models

Figure 2 for From Code to Play: Benchmarking Program Search for Games Using Large Language Models

Figure 3 for From Code to Play: Benchmarking Program Search for Games Using Large Language Models

Figure 4 for From Code to Play: Benchmarking Program Search for Games Using Large Language Models

Abstract:Large language models (LLMs) have shown impressive capabilities in generating program code, opening exciting opportunities for applying program synthesis to games. In this work, we explore the potential of LLMs to directly synthesize usable code for a wide range of gaming applications, focusing on two programming languages, Python and Java. We use an evolutionary hill-climbing algorithm, where the mutations and seeds of the initial programs are controlled by LLMs. For Python, the framework covers various game-related tasks, including five miniature versions of Atari games, ten levels of Baba is You, an environment inspired by Asteroids, and a maze generation task. For Java, the framework contains 12 games from the TAG tabletop games framework. Across 29 tasks, we evaluated 12 language models for Python and 8 for Java. Our findings suggest that the performance of LLMs depends more on the task than on model size. While larger models generate more executable programs, these do not always result in higher-quality solutions but are much more expensive. No model has a clear advantage, although on any specific task, one model may be better. Trying many models on a problem and using the best results across them is more reliable than using just one.

* Submitted to Transactions on Games Special Issue on Large Language Models and Games

Via

Access Paper or Ask Questions

Unveiling the Decision-Making Process in Reinforcement Learning with Genetic Programming

Jul 20, 2024

Manuel Eberhardinger, Florian Rupp, Johannes Maucher, Setareh Maghsudi

Figure 1 for Unveiling the Decision-Making Process in Reinforcement Learning with Genetic Programming

Figure 2 for Unveiling the Decision-Making Process in Reinforcement Learning with Genetic Programming

Figure 3 for Unveiling the Decision-Making Process in Reinforcement Learning with Genetic Programming

Figure 4 for Unveiling the Decision-Making Process in Reinforcement Learning with Genetic Programming

Abstract:Despite tremendous progress, machine learning and deep learning still suffer from incomprehensible predictions. Incomprehensibility, however, is not an option for the use of (deep) reinforcement learning in the real world, as unpredictable actions can seriously harm the involved individuals. In this work, we propose a genetic programming framework to generate explanations for the decision-making process of already trained agents by imitating them with programs. Programs are interpretable and can be executed to generate explanations of why the agent chooses a particular action. Furthermore, we conduct an ablation study that investigates how extending the domain-specific language by using library learning alters the performance of the method. We compare our results with the previous state of the art for this problem and show that we are comparable in performance but require much less hardware resources and computation time.

* Accepted at: The Fifteenth International Conference on Swarm Intelligence (ICSI'2024)

Via

Access Paper or Ask Questions

Decentralized Task Offloading and Load-Balancing for Mobile Edge Computing in Dense Networks

Jun 24, 2024

Mariam Yahya, Alexander Conzelmann, Setareh Maghsudi

Abstract:We study the problem of decentralized task offloading and load-balancing in a dense network with numerous devices and a set of edge servers. Solving this problem optimally is complicated due to the unknown network information and random task sizes. The shared network resources also influence the users' decisions and resource distribution. Our solution combines the mean field multi-agent multi-armed bandit (MAB) game with a load-balancing technique that adjusts the servers' rewards to achieve a target population profile despite the distributed user decision-making. Numerical results demonstrate the efficacy of our approach and the convergence to the target load distribution.

Via

Access Paper or Ask Questions

Distributed Management of Fluctuating Energy Resources in Dynamic Networked Systems

May 29, 2024

Xiaotong Cheng, Ioannis Tsetis, Setareh Maghsudi

Abstract:Modern power systems integrate renewable distributed energy resources (DERs) as an environment-friendly enhancement to meet the ever-increasing demands. However, the inherent unreliability of renewable energy renders developing DER management algorithms imperative. We study the energy-sharing problem in a system consisting of several DERs. Each agent harvests and distributes renewable energy in its neighborhood to optimize the network's performance while minimizing energy waste. We model this problem as a bandit convex optimization problem with constraints that correspond to each node's limitations for energy production. We propose distributed decision-making policies to solve the formulated problem, where we utilize the notion of dynamic regret as the performance metric. We also include an adjustment strategy in our developed algorithm to reduce the constraint violations. Besides, we design a policy that deals with the non-stationary environment. Theoretical analysis shows the effectiveness of our proposed algorithm. Numerical experiments using a real-world dataset show superior performance of our proposal compared to state-of-the-art methods.

Via

Access Paper or Ask Questions

Budgeted Recommendation with Delayed Feedback

May 19, 2024

Kweiguu Liu, Setareh Maghsudi

Abstract:In a conventional contextual multi-armed bandit problem, the feedback (or reward) is immediately observable after an action. Nevertheless, delayed feedback arises in numerous real-life situations and is particularly crucial in time-sensitive applications. The exploration-exploitation dilemma becomes particularly challenging under such conditions, as it couples with the interplay between delays and limited resources. Besides, a limited budget often aggravates the problem by restricting the exploration potential. A motivating example is the distribution of medical supplies at the early stage of COVID-19. The delayed feedback of testing results, thus insufficient information for learning, degraded the efficiency of resource allocation. Motivated by such applications, we study the effect of delayed feedback on constrained contextual bandits. We develop a decision-making policy, delay-oriented resource allocation with learning (DORAL), to optimize the resource expenditure in a contextual multi-armed bandit problem with arm-dependent delayed feedback.

Via

Access Paper or Ask Questions

Adaptive Regularization of Representation Rank as an Implicit Constraint of Bellman Equation

Apr 19, 2024

Qiang He, Tianyi Zhou, Meng Fang, Setareh Maghsudi

Abstract:Representation rank is an important concept for understanding the role of Neural Networks (NNs) in Deep Reinforcement learning (DRL), which measures the expressive capacity of value networks. Existing studies focus on unboundedly maximizing this rank; nevertheless, that approach would introduce overly complex models in the learning, thus undermining performance. Hence, fine-tuning representation rank presents a challenging and crucial optimization problem. To address this issue, we find a guiding principle for adaptive control of the representation rank. We employ the Bellman equation as a theoretical foundation and derive an upper bound on the cosine similarity of consecutive state-action pairs representations of value networks. We then leverage this upper bound to propose a novel regularizer, namely BEllman Equation-based automatic rank Regularizer (BEER). This regularizer adaptively regularizes the representation rank, thus improving the DRL agent's performance. We first validate the effectiveness of automatic control of rank on illustrative experiments. Then, we scale up BEER to complex continuous control tasks by combining it with the deterministic policy gradient method. Among 12 challenging DeepMind control tasks, BEER outperforms the baselines by a large margin. Besides, BEER demonstrates significant advantages in Q-value approximation. Our code is available at https://github.com/sweetice/BEER-ICLR2024.

* Accepted to CVPR23; Code: https://github.com/sweetice/BEER-ICLR2024

Via

Access Paper or Ask Questions