Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michael P. Wellman

Policy Abstraction and Nash Refinement in Tree-Exploiting PSRO

Feb 05, 2025

Christine Konicki, Mithun Chakraborty, Michael P. Wellman

Abstract:Policy Space Response Oracles (PSRO) interleaves empirical game-theoretic analysis with deep reinforcement learning (DRL) to solve games too complex for traditional analytic methods. Tree-exploiting PSRO (TE-PSRO) is a variant of this approach that iteratively builds a coarsened empirical game model in extensive form using data obtained from querying a simulator that represents a detailed description of the game. We make two main methodological advances to TE-PSRO that enhance its applicability to complex games of imperfect information. First, we introduce a scalable representation for the empirical game tree where edges correspond to implicit policies learned through DRL. These policies cover conditions in the underlying game abstracted in the game model, supporting sustainable growth of the tree over epochs. Second, we leverage extensive form in the empirical model by employing refined Nash equilibria to direct strategy exploration. To enable this, we give a modular and scalable algorithm based on generalized backward induction for computing a subgame perfect equilibrium (SPE) in an imperfect-information game. We experimentally evaluate our approach on a suite of games including an alternating-offer bargaining game with outside offers; our results demonstrate that TE-PSRO converges toward equilibrium faster when new strategies are generated based on SPE rather than Nash equilibrium, and with reasonable time/memory requirements for the growing empirical model.

Via

Access Paper or Ask Questions

Co-Learning Empirical Games and World Models

May 23, 2023

Max Olan Smith, Michael P. Wellman

Abstract:Game-based decision-making involves reasoning over both world dynamics and strategic interactions among the agents. Typically, empirical models capturing these respective aspects are learned and used separately. We investigate the potential gain from co-learning these elements: a world model for dynamics and an empirical game for strategic interactions. Empirical games drive world models toward a broader consideration of possible game dynamics induced by a diversity of strategy profiles. Conversely, world models guide empirical games to efficiently discover new strategies through planning. We demonstrate these benefits first independently, then in combination as realized by a new algorithm, Dyna-PSRO, that co-learns an empirical game and a world model. When compared to PSRO -- a baseline empirical-game building algorithm, Dyna-PSRO is found to compute lower regret solutions on partially observable general-sum games. In our experiments, Dyna-PSRO also requires substantially fewer experiences than PSRO, a key algorithmic advantage for settings where collecting player-game interaction data is a cost-limiting factor.

Via

Access Paper or Ask Questions

Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning

Feb 01, 2023

Zun Li, Marc Lanctot, Kevin R. McKee, Luke Marris, Ian Gemp, Daniel Hennes, Paul Muller, Kate Larson, Yoram Bachrach, Michael P. Wellman

Figure 1 for Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning

Figure 2 for Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning

Figure 3 for Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning

Figure 4 for Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning

Abstract:Multiagent reinforcement learning (MARL) has benefited significantly from population-based and game-theoretic training regimes. One approach, Policy-Space Response Oracles (PSRO), employs standard reinforcement learning to compute response policies via approximate best responses and combines them via meta-strategy selection. We augment PSRO by adding a novel search procedure with generative sampling of world states, and introduce two new meta-strategy solvers based on the Nash bargaining solution. We evaluate PSRO's ability to compute approximate Nash equilibrium, and its performance in two negotiation games: Colored Trails, and Deal or No Deal. We conduct behavioral studies where human participants negotiate with our agents ($N = 346$). We find that search with generative modeling finds stronger policies during both training time and test time, enables online Bayesian co-player prediction, and can produce agents that achieve comparable social welfare negotiating with humans as humans trading among themselves.

Via

Access Paper or Ask Questions

Multiattribute Auctions Based on Generalized Additive Independence

Jan 16, 2014

Yagil Engel, Michael P. Wellman

Figure 1 for Multiattribute Auctions Based on Generalized Additive Independence

Figure 2 for Multiattribute Auctions Based on Generalized Additive Independence

Figure 3 for Multiattribute Auctions Based on Generalized Additive Independence

Figure 4 for Multiattribute Auctions Based on Generalized Additive Independence

Abstract:We develop multiattribute auctions that accommodate generalized additive independent (GAI) preferences. We propose an iterative auction mechanism that maintains prices on potentially overlapping GAI clusters of attributes, thus decreases elicitation and computational burden, and creates an open competition among suppliers over a multidimensional domain. Most significantly, the auction is guaranteed to achieve surplus which approximates optimal welfare up to a small additive factor, under reasonable equilibrium strategies of traders. The main departure of GAI auctions from previous literature is to accommodate non-additive trader preferences, hence allowing traders to condition their evaluation of specific attributes on the value of other attributes. At the same time, the GAI structure supports a compact representation of prices, enabling a tractable auction process. We perform a simulation study, demonstrating and quantifying the significant efficiency advantage of more expressive preference modeling. We draw random GAI-structured utility functions with various internal structures, generate additive functions that approximate the GAI utility, and compare the performance of the auctions using the two representations. We find that allowing traders to express existing dependencies among attributes improves the economic efficiency of multiattribute auctions.

* Journal Of Artificial Intelligence Research, Volume 37, pages 479-525, 2010

Via

Access Paper or Ask Questions

Qualitative Probabilistic Networks for Planning Under Uncertainty

Mar 27, 2013

Michael P. Wellman

Figure 1 for Qualitative Probabilistic Networks for Planning Under Uncertainty

Figure 2 for Qualitative Probabilistic Networks for Planning Under Uncertainty

Figure 3 for Qualitative Probabilistic Networks for Planning Under Uncertainty

Figure 4 for Qualitative Probabilistic Networks for Planning Under Uncertainty

Abstract:Bayesian networks provide a probabilistic semantics for qualitative assertions about likelihood. A qualitative reasoner based on an algebra over these assertions can derive further conclusions about the influence of actions. While the conclusions are much weaker than those computed from complete probability distributions, they are still valuable for suggesting potential actions, eliminating obviously inferior plans, identifying important tradeoffs, and explaining probabilistic models.

* Appears in Proceedings of the Second Conference on Uncertainty in Artificial Intelligence (UAI1986)

Via

Access Paper or Ask Questions

The Role of Calculi in Uncertain Inference Systems

Mar 27, 2013

Michael P. Wellman, David Heckerman

Abstract:Much of the controversy about methods for automated decision making has focused on specific calculi for combining beliefs or propagating uncertainty. We broaden the debate by (1) exploring the constellation of secondary tasks surrounding any primary decision problem, and (2) identifying knowledge engineering concerns that present additional representational tradeoffs. We argue on pragmatic grounds that the attempt to support all of these tasks within a single calculus is misguided. In the process, we note several uncertain reasoning objectives that conflict with the Bayesian ideal of complete specification of probabilities and utilities. In response, we advocate treating the uncertainty calculus as an object language for reasoning mechanisms that support the secondary tasks. Arguments against Bayesian decision theory are weakened when the calculus is relegated to this role. Architectures for uncertainty handling that take statements in the calculus as objects to be reasoned about offer the prospect of retaining normative status with respect to decision making while supporting the other tasks in uncertain reasoning.

* Appears in Proceedings of the Third Conference on Uncertainty in Artificial Intelligence (UAI1987)

Via

Access Paper or Ask Questions

Exploiting Functional Dependencies in Qualitative Probabilistic Reasoning

Mar 27, 2013

Michael P. Wellman

Figure 1 for Exploiting Functional Dependencies in Qualitative Probabilistic Reasoning

Figure 2 for Exploiting Functional Dependencies in Qualitative Probabilistic Reasoning

Figure 3 for Exploiting Functional Dependencies in Qualitative Probabilistic Reasoning

Figure 4 for Exploiting Functional Dependencies in Qualitative Probabilistic Reasoning

Abstract:Functional dependencies restrict the potential interactions among variables connected in a probabilistic network. This restriction can be exploited in qualitative probabilistic reasoning by introducing deterministic variables and modifying the inference rules to produce stronger conclusions in the presence of functional relations. I describe how to accomplish these modifications in qualitative probabilistic networks by exhibiting the update procedures for graphical transformations involving probabilistic and deterministic variables and combinations. A simple example demonstrates that the augmented scheme can reduce qualitative ambiguity that would arise without the special treatment of functional dependency. Analysis of qualitative synergy reveals that new higher-order relations are required to reason effectively about synergistic interactions among deterministic variables.

* Appears in Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence (UAI1990)

Via

Access Paper or Ask Questions

State-space Abstraction for Anytime Evaluation of Probabilistic Networks

Feb 27, 2013

Michael P. Wellman, Chao-Lin Liu

Figure 1 for State-space Abstraction for Anytime Evaluation of Probabilistic Networks

Figure 2 for State-space Abstraction for Anytime Evaluation of Probabilistic Networks

Figure 3 for State-space Abstraction for Anytime Evaluation of Probabilistic Networks

Abstract:One important factor determining the computational complexity of evaluating a probabilistic network is the cardinality of the state spaces of the nodes. By varying the granularity of the state spaces, one can trade off accuracy in the result for computational efficiency. We present an anytime procedure for approximate evaluation of probabilistic networks based on this idea. On application to some simple networks, the procedure exhibits a smooth improvement in approximation quality as computation time increases. This suggests that state-space abstraction is one more useful control parameter for designing real-time probabilistic reasoners.

* Appears in Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence (UAI1994)

Via

Access Paper or Ask Questions

The Automated Mapping of Plans for Plan Recognition

Feb 27, 2013

Marcus J. Huber, Edmund H. Durfee, Michael P. Wellman

Figure 1 for The Automated Mapping of Plans for Plan Recognition

Abstract:To coordinate with other agents in its environment, an agent needs models of what the other agents are trying to do. When communication is impossible or expensive, this information must be acquired indirectly via plan recognition. Typical approaches to plan recognition start with a specification of the possible plans the other agents may be following, and develop special techniques for discriminating among the possibilities. Perhaps more desirable would be a uniform procedure for mapping plans to general structures supporting inference based on uncertain and incomplete observations. In this paper, we describe a set of methods for converting plans represented in a flexible procedural language to observation models represented as probabilistic belief networks.

* Appears in Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence (UAI1994)

Via

Access Paper or Ask Questions

Path Planning under Time-Dependent Uncertainty

Feb 20, 2013

Michael P. Wellman, Matthew Ford, Kenneth Larson

Figure 1 for Path Planning under Time-Dependent Uncertainty

Abstract:Standard algorithms for finding the shortest path in a graph require that the cost of a path be additive in edge costs, and typically assume that costs are deterministic. We consider the problem of uncertain edge costs, with potential probabilistic dependencies among the costs. Although these dependencies violate the standard dynamic-programming decomposition, we identify a weaker stochastic consistency condition that justifies a generalized dynamic-programming approach based on stochastic dominance. We present a revised path-planning algorithm and prove that it produces optimal paths under time-dependent uncertain costs. We test the algorithm by applying it to a model of stochastic bus networks, and present empirical performance results comparing it to some alternatives. Finally, we consider extensions of these concepts to a more general class of problems of heuristic search under uncertainty.

* Appears in Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (UAI1995)

Via

Access Paper or Ask Questions