Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michael M. Zavlanos

Enhancing Cooperative Multi-Agent Reinforcement Learning with State Modelling and Adversarial Exploration

May 08, 2025

Andreas Kontogiannis, Konstantinos Papathanasiou, Yi Shen, Giorgos Stamou, Michael M. Zavlanos, George Vouros

Abstract:Learning to cooperate in distributed partially observable environments with no communication abilities poses significant challenges for multi-agent deep reinforcement learning (MARL). This paper addresses key concerns in this domain, focusing on inferring state representations from individual agent observations and leveraging these representations to enhance agents' exploration and collaborative task execution policies. To this end, we propose a novel state modelling framework for cooperative MARL, where agents infer meaningful belief representations of the non-observable state, with respect to optimizing their own policies, while filtering redundant and less informative joint state information. Building upon this framework, we propose the MARL SMPE algorithm. In SMPE, agents enhance their own policy's discriminative abilities under partial observability, explicitly by incorporating their beliefs into the policy network, and implicitly by adopting an adversarial type of exploration policies which encourages agents to discover novel, high-value states while improving the discriminative abilities of others. Experimentally, we show that SMPE outperforms state-of-the-art MARL algorithms in complex fully cooperative tasks from the MPE, LBF, and RWARE benchmarks.

* Accepted (Poster) at ICML 2025

Via

Access Paper or Ask Questions

LAPP: Large Language Model Feedback for Preference-Driven Reinforcement Learning

Apr 21, 2025

Pingcheng Jian, Xiao Wei, Yanbaihui Liu, Samuel A. Moore, Michael M. Zavlanos, Boyuan Chen

Abstract:We introduce Large Language Model-Assisted Preference Prediction (LAPP), a novel framework for robot learning that enables efficient, customizable, and expressive behavior acquisition with minimum human effort. Unlike prior approaches that rely heavily on reward engineering, human demonstrations, motion capture, or expensive pairwise preference labels, LAPP leverages large language models (LLMs) to automatically generate preference labels from raw state-action trajectories collected during reinforcement learning (RL). These labels are used to train an online preference predictor, which in turn guides the policy optimization process toward satisfying high-level behavioral specifications provided by humans. Our key technical contribution is the integration of LLMs into the RL feedback loop through trajectory-level preference prediction, enabling robots to acquire complex skills including subtle control over gait patterns and rhythmic timing. We evaluate LAPP on a diverse set of quadruped locomotion and dexterous manipulation tasks and show that it achieves efficient learning, higher final performance, faster adaptation, and precise control of high-level behaviors. Notably, LAPP enables robots to master highly dynamic and expressive tasks such as quadruped backflips, which remain out of reach for standard LLM-generated or handcrafted rewards. Our results highlight LAPP as a promising direction for scalable preference-driven robot learning.

Via

Access Paper or Ask Questions

Distributionally Robust Multi-Agent Reinforcement Learning for Dynamic Chute Mapping

Mar 12, 2025

Guangyi Liu, Suzan Iloglu, Michael Caldara, Joseph W. Durham, Michael M. Zavlanos

Abstract:In Amazon robotic warehouses, the destination-to-chute mapping problem is crucial for efficient package sorting. Often, however, this problem is complicated by uncertain and dynamic package induction rates, which can lead to increased package recirculation. To tackle this challenge, we introduce a Distributionally Robust Multi-Agent Reinforcement Learning (DRMARL) framework that learns a destination-to-chute mapping policy that is resilient to adversarial variations in induction rates. Specifically, DRMARL relies on group distributionally robust optimization (DRO) to learn a policy that performs well not only on average but also on each individual subpopulation of induction rates within the group that capture, for example, different seasonality or operation modes of the system. This approach is then combined with a novel contextual bandit-based predictor of the worst-case induction distribution for each state-action pair, significantly reducing the cost of exploration and thereby increasing the learning efficiency and scalability of our framework. Extensive simulations demonstrate that DRMARL achieves robust chute mapping in the presence of varying induction distributions, reducing package recirculation by an average of 80\% in the simulation scenario.

Via

Access Paper or Ask Questions

Inverse Reinforcement Learning from Non-Stationary Learning Agents

Oct 18, 2024

Kavinayan P. Sivakumar, Yi Shen, Zachary Bell, Scott Nivison, Boyuan Chen, Michael M. Zavlanos

Figure 1 for Inverse Reinforcement Learning from Non-Stationary Learning Agents

Figure 2 for Inverse Reinforcement Learning from Non-Stationary Learning Agents

Figure 3 for Inverse Reinforcement Learning from Non-Stationary Learning Agents

Figure 4 for Inverse Reinforcement Learning from Non-Stationary Learning Agents

Abstract:In this paper, we study an inverse reinforcement learning problem that involves learning the reward function of a learning agent using trajectory data collected while this agent is learning its optimal policy. To address this problem, we propose an inverse reinforcement learning method that allows us to estimate the policy parameters of the learning agent which can then be used to estimate its reward function. Our method relies on a new variant of the behavior cloning algorithm, which we call bundle behavior cloning, and uses a small number of trajectories generated by the learning agent's policy at different points in time to learn a set of policies that match the distribution of actions observed in the sampled trajectories. We then use the cloned policies to train a neural network model that estimates the reward function of the learning agent. We provide a theoretical analysis to show a complexity result on bound guarantees for our method that beats standard behavior cloning as well as numerical experiments for a reinforcement learning problem that validate the proposed method.

Via

Access Paper or Ask Questions

Transfer Reinforcement Learning in Heterogeneous Action Spaces using Subgoal Mapping

Oct 18, 2024

Kavinayan P. Sivakumar, Yan Zhang, Zachary Bell, Scott Nivison, Michael M. Zavlanos

Figure 1 for Transfer Reinforcement Learning in Heterogeneous Action Spaces using Subgoal Mapping

Figure 2 for Transfer Reinforcement Learning in Heterogeneous Action Spaces using Subgoal Mapping

Figure 3 for Transfer Reinforcement Learning in Heterogeneous Action Spaces using Subgoal Mapping

Figure 4 for Transfer Reinforcement Learning in Heterogeneous Action Spaces using Subgoal Mapping

Abstract:In this paper, we consider a transfer reinforcement learning problem involving agents with different action spaces. Specifically, for any new unseen task, the goal is to use a successful demonstration of this task by an expert agent in its action space to enable a learner agent learn an optimal policy in its own different action space with fewer samples than those required if the learner was learning on its own. Existing transfer learning methods across different action spaces either require handcrafted mappings between those action spaces provided by human experts, which can induce bias in the learning procedure, or require the expert agent to share its policy parameters with the learner agent, which does not generalize well to unseen tasks. In this work, we propose a method that learns a subgoal mapping between the expert agent policy and the learner agent policy. Since the expert agent and the learner agent have different action spaces, their optimal policies can have different subgoal trajectories. We learn this subgoal mapping by training a Long Short Term Memory (LSTM) network for a distribution of tasks and then use this mapping to predict the learner subgoal sequence for unseen tasks, thereby improving the speed of learning by biasing the agent's policy towards the predicted learner subgoal sequence. Through numerical experiments, we demonstrate that the proposed learning scheme can effectively find the subgoal mapping underlying the given distribution of tasks. Moreover, letting the learner agent imitate the expert agent's policy with the learnt subgoal mapping can significantly improve the sample efficiency and training time of the learner agent in unseen new tasks.

Via

Access Paper or Ask Questions

Distributionally Robust Clustered Federated Learning: A Case Study in Healthcare

Oct 09, 2024

Xenia Konti, Hans Riess, Manos Giannopoulos, Yi Shen, Michael J. Pencina, Nicoleta J. Economou-Zavlanos, Michael M. Zavlanos

Figure 1 for Distributionally Robust Clustered Federated Learning: A Case Study in Healthcare

Figure 2 for Distributionally Robust Clustered Federated Learning: A Case Study in Healthcare

Figure 3 for Distributionally Robust Clustered Federated Learning: A Case Study in Healthcare

Abstract:In this paper, we address the challenge of heterogeneous data distributions in cross-silo federated learning by introducing a novel algorithm, which we term Cross-silo Robust Clustered Federated Learning (CS-RCFL). Our approach leverages the Wasserstein distance to construct ambiguity sets around each client's empirical distribution that capture possible distribution shifts in the local data, enabling evaluation of worst-case model performance. We then propose a model-agnostic integer fractional program to determine the optimal distributionally robust clustering of clients into coalitions so that possible biases in the local models caused by statistically heterogeneous client datasets are avoided, and analyze our method for linear and logistic regression models. Finally, we discuss a federated learning protocol that ensures the privacy of client distributions, a critical consideration, for instance, when clients are healthcare institutions. We evaluate our algorithm on synthetic and real-world healthcare data.

* 8 pages, 3 figures, Accepted to IEEE CDC 2024

Via

Access Paper or Ask Questions

Perception Stitching: Zero-Shot Perception Encoder Transfer for Visuomotor Robot Policies

Jun 28, 2024

Pingcheng Jian, Easop Lee, Zachary Bell, Michael M. Zavlanos, Boyuan Chen

Figure 1 for Perception Stitching: Zero-Shot Perception Encoder Transfer for Visuomotor Robot Policies

Figure 2 for Perception Stitching: Zero-Shot Perception Encoder Transfer for Visuomotor Robot Policies

Figure 3 for Perception Stitching: Zero-Shot Perception Encoder Transfer for Visuomotor Robot Policies

Figure 4 for Perception Stitching: Zero-Shot Perception Encoder Transfer for Visuomotor Robot Policies

Abstract:Vision-based imitation learning has shown promising capabilities of endowing robots with various motion skills given visual observation. However, current visuomotor policies fail to adapt to drastic changes in their visual observations. We present Perception Stitching that enables strong zero-shot adaptation to large visual changes by directly stitching novel combinations of visual encoders. Our key idea is to enforce modularity of visual encoders by aligning the latent visual features among different visuomotor policies. Our method disentangles the perceptual knowledge with the downstream motion skills and allows the reuse of the visual encoders by directly stitching them to a policy network trained with partially different visual conditions. We evaluate our method in various simulated and real-world manipulation tasks. While baseline methods failed at all attempts, our method could achieve zero-shot success in real-world visuomotor tasks. Our quantitative and qualitative analysis of the learned features of the policy network provides more insights into the high performance of our proposed method.

Via

Access Paper or Ask Questions

Risk-averse Learning with Non-Stationary Distributions

Apr 03, 2024

Siyi Wang, Zifan Wang, Xinlei Yi, Michael M. Zavlanos, Karl H. Johansson, Sandra Hirche

Figure 1 for Risk-averse Learning with Non-Stationary Distributions

Figure 2 for Risk-averse Learning with Non-Stationary Distributions

Figure 3 for Risk-averse Learning with Non-Stationary Distributions

Figure 4 for Risk-averse Learning with Non-Stationary Distributions

Abstract:Considering non-stationary environments in online optimization enables decision-maker to effectively adapt to changes and improve its performance over time. In such cases, it is favorable to adopt a strategy that minimizes the negative impact of change to avoid potentially risky situations. In this paper, we investigate risk-averse online optimization where the distribution of the random cost changes over time. We minimize risk-averse objective function using the Conditional Value at Risk (CVaR) as risk measure. Due to the difficulty in obtaining the exact CVaR gradient, we employ a zeroth-order optimization approach that queries the cost function values multiple times at each iteration and estimates the CVaR gradient using the sampled values. To facilitate the regret analysis, we use a variation metric based on Wasserstein distance to capture time-varying distributions. Given that the distribution variation is sub-linear in the total number of episodes, we show that our designed learning algorithm achieves sub-linear dynamic regret with high probability for both convex and strongly convex functions. Moreover, theoretical results suggest that increasing the number of samples leads to a reduction in the dynamic regret bounds until the sampling number reaches a specific limit. Finally, we provide numerical experiments of dynamic pricing in a parking lot to illustrate the efficacy of the designed algorithm.

Via

Access Paper or Ask Questions

Path Signatures and Graph Neural Networks for Slow Earthquake Analysis: Better Together?

Feb 05, 2024

Hans Riess, Manolis Veveakis, Michael M. Zavlanos

Abstract:The path signature, having enjoyed recent success in the machine learning community, is a theoretically-driven method for engineering features from irregular paths. On the other hand, graph neural networks (GNN), neural architectures for processing data on graphs, excel on tasks with irregular domains, such as sensor networks. In this paper, we introduce a novel approach, Path Signature Graph Convolutional Neural Networks (PS-GCNN), integrating path signatures into graph convolutional neural networks (GCNN), and leveraging the strengths of both path signatures, for feature extraction, and GCNNs, for handling spatial interactions. We apply our method to analyze slow earthquake sequences, also called slow slip events (SSE), utilizing data from GPS timeseries, with a case study on a GPS sensor network on the east coast of New Zealand's north island. We also establish benchmarks for our method on simulated stochastic differential equations, which model similar reaction-diffusion phenomenon. Our methodology shows promise for future advancement in earthquake prediction and sensor network analysis.

Via

Access Paper or Ask Questions

Policy Stitching: Learning Transferable Robot Policies

Sep 24, 2023

Pingcheng Jian, Easop Lee, Zachary Bell, Michael M. Zavlanos, Boyuan Chen

Figure 1 for Policy Stitching: Learning Transferable Robot Policies

Figure 2 for Policy Stitching: Learning Transferable Robot Policies

Figure 3 for Policy Stitching: Learning Transferable Robot Policies

Figure 4 for Policy Stitching: Learning Transferable Robot Policies

Abstract:Training robots with reinforcement learning (RL) typically involves heavy interactions with the environment, and the acquired skills are often sensitive to changes in task environments and robot kinematics. Transfer RL aims to leverage previous knowledge to accelerate learning of new tasks or new body configurations. However, existing methods struggle to generalize to novel robot-task combinations and scale to realistic tasks due to complex architecture design or strong regularization that limits the capacity of the learned policy. We propose Policy Stitching, a novel framework that facilitates robot transfer learning for novel combinations of robots and tasks. Our key idea is to apply modular policy design and align the latent representations between the modular interfaces. Our method allows direct stitching of the robot and task modules trained separately to form a new policy for fast adaptation. Our simulated and real-world experiments on various 3D manipulation tasks demonstrate the superior zero-shot and few-shot transfer learning performances of our method. Our project website is at: http://generalroboticslab.com/PolicyStitching/ .

* CoRL 2023

Via

Access Paper or Ask Questions