Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mustafa Mert Çelikok

Department of Computer Science, Aalto University

SHARPIE: A Modular Framework for Reinforcement Learning and Human-AI Interaction Experiments

Jan 31, 2025

Hüseyin Aydın, Kevin Dubois-Godin, Libio Goncalvez Braz, Floris den Hengst, Kim Baraka, Mustafa Mert Çelikok, Andreas Sauter, Shihan Wang, Frans A. Oliehoek

Figure 1 for SHARPIE: A Modular Framework for Reinforcement Learning and Human-AI Interaction Experiments

Figure 2 for SHARPIE: A Modular Framework for Reinforcement Learning and Human-AI Interaction Experiments

Figure 3 for SHARPIE: A Modular Framework for Reinforcement Learning and Human-AI Interaction Experiments

Abstract:Reinforcement learning (RL) offers a general approach for modeling and training AI agents, including human-AI interaction scenarios. In this paper, we propose SHARPIE (Shared Human-AI Reinforcement Learning Platform for Interactive Experiments) to address the need for a generic framework to support experiments with RL agents and humans. Its modular design consists of a versatile wrapper for RL environments and algorithm libraries, a participant-facing web interface, logging utilities, deployment on popular cloud and participant recruitment platforms. It empowers researchers to study a wide variety of research questions related to the interaction between humans and RL agents, including those related to interactive reward specification and learning, learning from human feedback, action delegation, preference elicitation, user-modeling, and human-AI teaming. The platform is based on a generic interface for human-RL interactions that aims to standardize the field of study on RL in human contexts.

Via

Access Paper or Ask Questions

On the Complexity of Learning to Cooperate with Populations of Socially Rational Agents

Jun 29, 2024

Robert Loftin, Saptarashmi Bandyopadhyay, Mustafa Mert Çelikok

Abstract:Artificially intelligent agents deployed in the real-world will require the ability to reliably \textit{cooperate} with humans (as well as other, heterogeneous AI agents). To provide formal guarantees of successful cooperation, we must make some assumptions about how partner agents could plausibly behave. Any realistic set of assumptions must account for the fact that other agents may be just as adaptable as our agent is. In this work, we consider the problem of cooperating with a \textit{population} of agents in a finitely-repeated, two player general-sum matrix game with private utilities. Two natural assumptions in such settings are that: 1) all agents in the population are individually rational learners, and 2) when any two members of the population are paired together, with high-probability they will achieve at least the same utility as they would under some Pareto efficient equilibrium strategy. Our results first show that these assumptions alone are insufficient to ensure \textit{zero-shot} cooperation with members of the target population. We therefore consider the problem of \textit{learning} a strategy for cooperating with such a population using prior observations its members interacting with one another. We provide upper and lower bounds on the number of samples needed to learn an effective cooperation strategy. Most importantly, we show that these bounds can be much stronger than those arising from a "naive'' reduction of the problem to one of imitation learning.

Via

Access Paper or Ask Questions

Inverse Concave-Utility Reinforcement Learning is Inverse Game Theory

May 29, 2024

Mustafa Mert Çelikok, Frans A. Oliehoek, Jan-Willem van de Meent

Abstract:We consider inverse reinforcement learning problems with concave utilities. Concave Utility Reinforcement Learning (CURL) is a generalisation of the standard RL objective, which employs a concave function of the state occupancy measure, rather than a linear function. CURL has garnered recent attention for its ability to represent instances of many important applications including the standard RL such as imitation learning, pure exploration, constrained MDPs, offline RL, human-regularized RL, and others. Inverse reinforcement learning is a powerful paradigm that focuses on recovering an unknown reward function that can rationalize the observed behaviour of an agent. There has been recent theoretical advances in inverse RL where the problem is formulated as identifying the set of feasible reward functions. However, inverse RL for CURL problems has not been considered previously. In this paper we show that most of the standard IRL results do not apply to CURL in general, since CURL invalidates the classical Bellman equations. This calls for a new theoretical framework for the inverse CURL problem. Using a recent equivalence result between CURL and Mean-field Games, we propose a new definition for the feasible rewards for I-CURL by proving that this problem is equivalent to an inverse game theory problem in a subclass of mean-field games. We present initial query and sample complexity results for the I-CURL problem under assumptions such as Lipschitz-continuity. Finally, we outline future directions and applications in human--AI collaboration enabled by our results.

Via

Access Paper or Ask Questions

Towards a Unifying Model of Rationality in Multiagent Systems

May 29, 2023

Robert Loftin, Mustafa Mert Çelikok, Frans A. Oliehoek

Abstract:Multiagent systems deployed in the real world need to cooperate with other agents (including humans) nearly as effectively as these agents cooperate with one another. To design such AI, and provide guarantees of its effectiveness, we need to clearly specify what types of agents our AI must be able to cooperate with. In this work we propose a generic model of socially intelligent agents, which are individually rational learners that are also able to cooperate with one another (in the sense that their joint behavior is Pareto efficient). We define rationality in terms of the regret incurred by each agent over its lifetime, and show how we can construct socially intelligent agents for different forms of regret. We then discuss the implications of this model for the development of "robust" MAS that can cooperate with a wide variety of socially intelligent agents.

* 5 Pages, To appear in the OptLearnMAS Workshop at AAMAS 2023

Via

Access Paper or Ask Questions

Uncoupled Learning of Differential Stackelberg Equilibria with Commitments

Feb 07, 2023

Robert Loftin, Mustafa Mert Çelikok, Herke van Hoof, Samuel Kaski, Frans A. Oliehoek

Abstract:A natural solution concept for many multiagent settings is the Stackelberg equilibrium, under which a ``leader'' agent selects a strategy that maximizes its own payoff assuming the ``follower'' chooses their best response to this strategy. Recent work has presented asymmetric learning updates that can be shown to converge to the \textit{differential} Stackelberg equilibria of two-player differentiable games. These updates are ``coupled'' in the sense that the leader requires some information about the follower's payoff function. Such coupled learning rules cannot be applied to \textit{ad hoc} interactive learning settings, and can be computationally impractical even in centralized training settings where the follower's payoffs are known. In this work, we present an ``uncoupled'' learning process under which each player's learning update only depends on their observations of the other's behavior. We prove that this process converges to a local Stackelberg equilibrium under similar conditions as previous coupled methods. We conclude with a discussion of the potential applications of our approach to human--AI cooperation and multi-agent reinforcement learning.

Via

Access Paper or Ask Questions

Differentiable User Models

Nov 29, 2022

Alex Hämäläinen, Mustafa Mert Çelikok, Samuel Kaski

Abstract:Probabilistic user modeling is essential for building collaborative AI systems within probabilistic frameworks. However, modern advanced user models, often designed as cognitive behavior simulators, are computationally prohibitive for interactive use in cooperative AI assistants. In this extended abstract, we address this problem by introducing widely-applicable differentiable surrogates for bypassing this computational bottleneck; the surrogates enable using modern behavioral models with online computational cost which is independent of their original computational cost. We show experimentally that modeling capabilities comparable to likelihood-free inference methods are achievable, with over eight orders of magnitude reduction in computational time. Finally, we demonstrate how AI-assistants can computationally feasibly use cognitive models in a previously studied menu-search task.

* This is an extended abstract accepted for presentation in NeurIPS 2022 HILL workshop

Via

Access Paper or Ask Questions

Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems

Jul 01, 2022

Miguel Suau, Jinke He, Mustafa Mert Çelikok, Matthijs T. J. Spaan, Frans A. Oliehoek

Figure 1 for Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems

Figure 2 for Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems

Figure 3 for Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems

Figure 4 for Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems

Abstract:Due to its high sample complexity, simulation is, as of today, critical for the successful application of reinforcement learning. Many real-world problems, however, exhibit overly complex dynamics, which makes their full-scale simulation computationally slow. In this paper, we show how to decompose large networked systems of many agents into multiple local components such that we can build separate simulators that run independently and in parallel. To monitor the influence that the different local components exert on one another, each of these simulators is equipped with a learned model that is periodically trained on real trajectories. Our empirical results reveal that distributing the simulation among different processes not only makes it possible to train large multi-agent systems in just a few hours but also helps mitigate the negative effects of simultaneous learning.

Via

Access Paper or Ask Questions

Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs

Apr 03, 2022

Mustafa Mert Çelikok, Frans A. Oliehoek, Samuel Kaski

Figure 1 for Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs

Figure 2 for Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs

Figure 3 for Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs

Figure 4 for Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs

Abstract:Centaurs are half-human, half-AI decision-makers where the AI's goal is to complement the human. To do so, the AI must be able to recognize the goals and constraints of the human and have the means to help them. We present a novel formulation of the interaction between the human and the AI as a sequential game where the agents are modelled using Bayesian best-response models. We show that in this case the AI's problem of helping bounded-rational humans make better decisions reduces to a Bayes-adaptive POMDP. In our simulated experiments, we consider an instantiation of our framework for humans who are subjectively optimistic about the AI's future behaviour. Our results show that when equipped with a model of the human, the AI can infer the human's bounds and nudge them towards better decisions. We discuss ways in which the machine can learn to improve upon its own limitations as well with the help of the human. We identify a novel trade-off for centaurs in partially observable tasks: for the AI's actions to be acceptable to the human, the machine must make sure their beliefs are sufficiently aligned, but aligning beliefs might be costly. We present a preliminary theoretical analysis of this trade-off and its dependence on task structure.

* This paper is presented in part at the International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS) 2022

Via

Access Paper or Ask Questions

Interactive AI with a Theory of Mind

Dec 01, 2019

Mustafa Mert Çelikok, Tomi Peltola, Pedram Daee, Samuel Kaski

Figure 1 for Interactive AI with a Theory of Mind

Figure 2 for Interactive AI with a Theory of Mind

Figure 3 for Interactive AI with a Theory of Mind

Abstract:Understanding each other is the key to success in collaboration. For humans, attributing mental states to others, the theory of mind, provides the crucial advantage. We argue for formulating human--AI interaction as a multi-agent problem, endowing AI with a computational theory of mind to understand and anticipate the user. To differentiate the approach from previous work, we introduce a categorisation of user modelling approaches based on the level of agency learnt in the interaction. We describe our recent work in using nested multi-agent modelling to formulate user models for multi-armed bandit based interactive AI systems, including a proof-of-concept user study.

* This is a slightly updated version of a manuscript that appeared in ACM CHI 2019 Workshop: Computational Modeling in Human-Computer Interaction

Via

Access Paper or Ask Questions

Modelling User's Theory of AI's Mind in Interactive Intelligent Systems

Sep 08, 2018

Tomi Peltola, Mustafa Mert Çelikok, Pedram Daee, Samuel Kaski

Figure 1 for Modelling User's Theory of AI's Mind in Interactive Intelligent Systems

Figure 2 for Modelling User's Theory of AI's Mind in Interactive Intelligent Systems

Figure 3 for Modelling User's Theory of AI's Mind in Interactive Intelligent Systems

Figure 4 for Modelling User's Theory of AI's Mind in Interactive Intelligent Systems

Abstract:Many interactive intelligent systems, such as recommendation and information retrieval systems, treat users as a passive data source. Yet, users form mental models of systems and instead of passively providing feedback to the queries of the system, they will strategically plan their actions within the constraints of the mental model to steer the system and achieve their goals faster. We propose to explicitly account for the user's theory of the AI's mind in the user model: the intelligent system has a model of the user having a model of the intelligent system. We study a case where the system is a contextual bandit and the user model is a Markov decision process that plans based on a simpler model of the bandit. Inference in the model can be reduced to probabilistic inverse reinforcement learning, with the nested bandit model defining the transition dynamics, and is implemented using probabilistic programming. Our results show that improved performance is achieved if users can form accurate mental models that the system can capture, implying predictability of the interactive intelligent system is important not only for the user experience but also for the design of the system's statistical models.

* 18 pages, 9 figures

Via

Access Paper or Ask Questions