Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaojin Zhu

Optimally Installing Strict Equilibria

Mar 05, 2025

Jeremy McMahan, Young Wu, Yudong Chen, Xiaojin Zhu, Qiaomin Xie

Abstract:In this work, we develop a reward design framework for installing a desired behavior as a strict equilibrium across standard solution concepts: dominant strategy equilibrium, Nash equilibrium, correlated equilibrium, and coarse correlated equilibrium. We also extend our framework to capture the Markov-perfect equivalents of each solution concept. Central to our framework is a comprehensive mathematical characterization of strictly installable, based on the desired solution concept and the behavior's structure. These characterizations lead to efficient iterative algorithms, which we generalize to handle optimization objectives through linear programming. Finally, we explore how our results generalize to bounded rational agents.

Via

Access Paper or Ask Questions

The Battling Influencers Game: Nash Equilibria Structure of a Potential Game and Implications to Value Alignment

Feb 03, 2025

Young Wu, Yancheng Zhu, Jin-Yi Cai, Xiaojin Zhu

Figure 1 for The Battling Influencers Game: Nash Equilibria Structure of a Potential Game and Implications to Value Alignment

Figure 2 for The Battling Influencers Game: Nash Equilibria Structure of a Potential Game and Implications to Value Alignment

Figure 3 for The Battling Influencers Game: Nash Equilibria Structure of a Potential Game and Implications to Value Alignment

Figure 4 for The Battling Influencers Game: Nash Equilibria Structure of a Potential Game and Implications to Value Alignment

Abstract:When multiple influencers attempt to compete for a receiver's attention, their influencing strategies must account for the presence of one another. We introduce the Battling Influencers Game (BIG), a multi-player simultaneous-move general-sum game, to provide a game-theoretic characterization of this social phenomenon. We prove that BIG is a potential game, that it has either one or an infinite number of pure Nash equilibria (NEs), and these pure NEs can be found by convex optimization. Interestingly, we also prove that at any pure NE, all (except at most one) influencers must exaggerate their actions to the maximum extent. In other words, it is rational for the influencers to be non-truthful and extreme because they anticipate other influencers to cancel out part of their influence. We discuss the implications of BIG to value alignment.

* 9 pages, 8 figures, submitted to ICML

Via

Access Paper or Ask Questions

Anytime-Constrained Multi-Agent Reinforcement Learning

Oct 31, 2024

Jeremy McMahan, Xiaojin Zhu

Abstract:We introduce anytime constraints to the multi-agent setting with the corresponding solution concept being anytime-constrained equilibrium (ACE). Then, we present a comprehensive theory of anytime-constrained Markov games, which includes (1) a computational characterization of feasible policies, (2) a fixed-parameter tractable algorithm for computing ACE, and (3) a polynomial-time algorithm for approximately computing feasible ACE. Since computing a feasible policy is NP-hard even for two-player zero-sum games, our approximation guarantees are the best possible under worst-case analysis. We also develop the first theory of efficient computation for action-constrained Markov games, which may be of independent interest.

Via

Access Paper or Ask Questions

CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models

Aug 26, 2024

Shubham Bharti, Shiyun Cheng, Jihyun Rho, Martina Rao, Xiaojin Zhu

Figure 1 for CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models

Figure 2 for CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models

Figure 3 for CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models

Figure 4 for CHARTOM: A Visual Theory-of-Mind Benchmark for Multimodal Large Language Models

Abstract:We introduce CHARTOM, a visual theory-of-mind benchmark for multimodal large language models. CHARTOM consists of specially designed data visualizing charts. Given a chart, a language model needs to not only correctly comprehend the chart (the FACT question) but also judge if the chart will be misleading to a human reader (the MIND question). Both questions have significant societal benefits. We detail the construction of the CHARTOM benchmark including its calibration on human performance.

Via

Access Paper or Ask Questions

Data Sharing for Mean Estimation Among Heterogeneous Strategic Agents

Jul 20, 2024

Alex Clinton, Yiding Chen, Xiaojin Zhu, Kirthevasan Kandasamy

Abstract:We study a collaborative learning problem where $m$ agents estimate a vector $\mu\in\mathbb{R}^d$ by collecting samples from normal distributions, with each agent $i$ incurring a cost $c_{i,k} \in (0, \infty]$ to sample from the $k^{\text{th}}$ distribution $\mathcal{N}(\mu_k, \sigma^2)$. Instead of working on their own, agents can collect data that is cheap to them, and share it with others in exchange for data that is expensive or even inaccessible to them, thereby simultaneously reducing data collection costs and estimation error. However, when agents have different collection costs, we need to first decide how to fairly divide the work of data collection so as to benefit all agents. Moreover, in naive sharing protocols, strategic agents may under-collect and/or fabricate data, leading to socially undesirable outcomes. Our mechanism addresses these challenges by combining ideas from cooperative and non-cooperative game theory. We use ideas from axiomatic bargaining to divide the cost of data collection. Given such a solution, we develop a Nash incentive-compatible (NIC) mechanism to enforce truthful reporting. We achieve a $\mathcal{O}(\sqrt{m})$ approximation to the minimum social penalty (sum of agent estimation errors and data collection costs) in the worst case, and a $\mathcal{O}(1)$ approximation under favorable conditions. We complement this with a hardness result, showing that $\Omega(\sqrt{m})$ is unavoidable in any NIC mechanism.

Via

Access Paper or Ask Questions

Inception: Efficiently Computable Misinformation Attacks on Markov Games

Jun 24, 2024

Jeremy McMahan, Young Wu, Yudong Chen, Xiaojin Zhu, Qiaomin Xie

Figure 1 for Inception: Efficiently Computable Misinformation Attacks on Markov Games

Figure 2 for Inception: Efficiently Computable Misinformation Attacks on Markov Games

Abstract:We study security threats to Markov games due to information asymmetry and misinformation. We consider an attacker player who can spread misinformation about its reward function to influence the robust victim player's behavior. Given a fixed fake reward function, we derive the victim's policy under worst-case rationality and present polynomial-time algorithms to compute the attacker's optimal worst-case policy based on linear programming and backward induction. Then, we provide an efficient inception ("planting an idea in someone's mind") attack algorithm to find the optimal fake reward function within a restricted set of reward functions with dominant strategies. Importantly, our methods exploit the universal assumption of rationality to compute attacks efficiently. Thus, our work exposes a security vulnerability arising from standard game assumptions under misinformation.

* Accepted to Reinforcement Learning Conference (RLC) 2024

Via

Access Paper or Ask Questions

Optimal Attack and Defense for Reinforcement Learning

Nov 30, 2023

Jeremy McMahan, Young Wu, Xiaojin Zhu, Qiaomin Xie

Figure 1 for Optimal Attack and Defense for Reinforcement Learning

Figure 2 for Optimal Attack and Defense for Reinforcement Learning

Figure 3 for Optimal Attack and Defense for Reinforcement Learning

Abstract:To ensure the usefulness of Reinforcement Learning (RL) in real systems, it is crucial to ensure they are robust to noise and adversarial attacks. In adversarial RL, an external attacker has the power to manipulate the victim agent's interaction with the environment. We study the full class of online manipulation attacks, which include (i) state attacks, (ii) observation attacks (which are a generalization of perceived-state attacks), (iii) action attacks, and (iv) reward attacks. We show the attacker's problem of designing a stealthy attack that maximizes its own expected reward, which often corresponds to minimizing the victim's value, is captured by a Markov Decision Process (MDP) that we call a meta-MDP since it is not the true environment but a higher level environment induced by the attacked interaction. We show that the attacker can derive optimal attacks by planning in polynomial time or learning with polynomial sample complexity using standard RL techniques. We argue that the optimal defense policy for the victim can be computed as the solution to a stochastic Stackelberg game, which can be further simplified into a partially-observable turn-based stochastic game (POTBSG). Neither the attacker nor the victim would benefit from deviating from their respective optimal policies, thus such solutions are truly robust. Although the defense problem is NP-hard, we show that optimal Markovian defenses can be computed (learned) in polynomial time (sample complexity) in many scenarios.

Via

Access Paper or Ask Questions

Optimally Teaching a Linear Behavior Cloning Agent

Nov 26, 2023

Shubham Kumar Bharti, Stephen Wright, Adish Singla, Xiaojin Zhu

Abstract:We study optimal teaching of Linear Behavior Cloning (LBC) learners. In this setup, the teacher can select which states to demonstrate to an LBC learner. The learner maintains a version space of infinite linear hypotheses consistent with the demonstration. The goal of the teacher is to teach a realizable target policy to the learner using minimum number of state demonstrations. This number is known as the Teaching Dimension(TD). We present a teaching algorithm called ``Teach using Iterative Elimination(TIE)" that achieves instance optimal TD. However, we also show that finding optimal teaching set computationally is NP-hard. We further provide an approximation algorithm that guarantees an approximation ratio of $\log(|A|-1)$ on the teaching dimension. Finally, we provide experimental results to validate the efficiency and effectiveness of our algorithm.

Via

Access Paper or Ask Questions

Learning interactions to boost human creativity with bandits and GPT-4

Nov 16, 2023

Ara Vartanian, Xiaoxi Sun, Yun-Shiuan Chuang, Siddharth Suresh, Xiaojin Zhu, Timothy T. Rogers

Figure 1 for Learning interactions to boost human creativity with bandits and GPT-4

Figure 2 for Learning interactions to boost human creativity with bandits and GPT-4

Figure 3 for Learning interactions to boost human creativity with bandits and GPT-4

Figure 4 for Learning interactions to boost human creativity with bandits and GPT-4

Abstract:This paper considers how interactions with AI algorithms can boost human creative thought. We employ a psychological task that demonstrates limits on human creativity, namely semantic feature generation: given a concept name, respondents must list as many of its features as possible. Human participants typically produce only a fraction of the features they know before getting "stuck." In experiments with humans and with a language AI (GPT-4) we contrast behavior in the standard task versus a variant in which participants can ask for algorithmically-generated hints. Algorithm choice is administered by a multi-armed bandit whose reward indicates whether the hint helped generating more features. Humans and the AI show similar benefits from hints, and remarkably, bandits learning from AI responses prefer the same prompting strategy as those learning from human behavior. The results suggest that strategies for boosting human creativity via computer interactions can be learned by bandits run on groups of simulated participants.

Via

Access Paper or Ask Questions

Anytime-Constrained Reinforcement Learning

Nov 14, 2023

Jeremy McMahan, Xiaojin Zhu

Abstract:We introduce and study constrained Markov Decision Processes (cMDPs) with anytime constraints. An anytime constraint requires the agent to never violate its budget at any point in time, almost surely. Although Markovian policies are no longer sufficient, we show that there exist optimal deterministic policies augmented with cumulative costs. In fact, we present a fixed-parameter tractable reduction from anytime-constrained cMDPs to unconstrained MDPs. Our reduction yields planning and learning algorithms that are time and sample-efficient for tabular cMDPs so long as the precision of the costs is logarithmic in the size of the cMDP. However, we also show that computing non-trivial approximately optimal policies is NP-hard in general. To circumvent this bottleneck, we design provable approximation algorithms that efficiently compute or learn an arbitrarily accurate approximately feasible policy with optimal value so long as the maximum supported cost is bounded by a polynomial in the cMDP or the absolute budget. Given our hardness results, our approximation guarantees are the best possible under worst-case analysis.

Via

Access Paper or Ask Questions