Abstract:Game-based decision-making involves reasoning over both world dynamics and strategic interactions among the agents. Typically, empirical models capturing these respective aspects are learned and used separately. We investigate the potential gain from co-learning these elements: a world model for dynamics and an empirical game for strategic interactions. Empirical games drive world models toward a broader consideration of possible game dynamics induced by a diversity of strategy profiles. Conversely, world models guide empirical games to efficiently discover new strategies through planning. We demonstrate these benefits first independently, then in combination as realized by a new algorithm, Dyna-PSRO, that co-learns an empirical game and a world model. When compared to PSRO -- a baseline empirical-game building algorithm, Dyna-PSRO is found to compute lower regret solutions on partially observable general-sum games. In our experiments, Dyna-PSRO also requires substantially fewer experiences than PSRO, a key algorithmic advantage for settings where collecting player-game interaction data is a cost-limiting factor.
Abstract:Multiagent reinforcement learning (MARL) has benefited significantly from population-based and game-theoretic training regimes. One approach, Policy-Space Response Oracles (PSRO), employs standard reinforcement learning to compute response policies via approximate best responses and combines them via meta-strategy selection. We augment PSRO by adding a novel search procedure with generative sampling of world states, and introduce two new meta-strategy solvers based on the Nash bargaining solution. We evaluate PSRO's ability to compute approximate Nash equilibrium, and its performance in two negotiation games: Colored Trails, and Deal or No Deal. We conduct behavioral studies where human participants negotiate with our agents ($N = 346$). We find that search with generative modeling finds stronger policies during both training time and test time, enables online Bayesian co-player prediction, and can produce agents that achieve comparable social welfare negotiating with humans as humans trading among themselves.
Abstract:We develop multiattribute auctions that accommodate generalized additive independent (GAI) preferences. We propose an iterative auction mechanism that maintains prices on potentially overlapping GAI clusters of attributes, thus decreases elicitation and computational burden, and creates an open competition among suppliers over a multidimensional domain. Most significantly, the auction is guaranteed to achieve surplus which approximates optimal welfare up to a small additive factor, under reasonable equilibrium strategies of traders. The main departure of GAI auctions from previous literature is to accommodate non-additive trader preferences, hence allowing traders to condition their evaluation of specific attributes on the value of other attributes. At the same time, the GAI structure supports a compact representation of prices, enabling a tractable auction process. We perform a simulation study, demonstrating and quantifying the significant efficiency advantage of more expressive preference modeling. We draw random GAI-structured utility functions with various internal structures, generate additive functions that approximate the GAI utility, and compare the performance of the auctions using the two representations. We find that allowing traders to express existing dependencies among attributes improves the economic efficiency of multiattribute auctions.
Abstract:Bayesian networks provide a probabilistic semantics for qualitative assertions about likelihood. A qualitative reasoner based on an algebra over these assertions can derive further conclusions about the influence of actions. While the conclusions are much weaker than those computed from complete probability distributions, they are still valuable for suggesting potential actions, eliminating obviously inferior plans, identifying important tradeoffs, and explaining probabilistic models.
Abstract:Much of the controversy about methods for automated decision making has focused on specific calculi for combining beliefs or propagating uncertainty. We broaden the debate by (1) exploring the constellation of secondary tasks surrounding any primary decision problem, and (2) identifying knowledge engineering concerns that present additional representational tradeoffs. We argue on pragmatic grounds that the attempt to support all of these tasks within a single calculus is misguided. In the process, we note several uncertain reasoning objectives that conflict with the Bayesian ideal of complete specification of probabilities and utilities. In response, we advocate treating the uncertainty calculus as an object language for reasoning mechanisms that support the secondary tasks. Arguments against Bayesian decision theory are weakened when the calculus is relegated to this role. Architectures for uncertainty handling that take statements in the calculus as objects to be reasoned about offer the prospect of retaining normative status with respect to decision making while supporting the other tasks in uncertain reasoning.
Abstract:Functional dependencies restrict the potential interactions among variables connected in a probabilistic network. This restriction can be exploited in qualitative probabilistic reasoning by introducing deterministic variables and modifying the inference rules to produce stronger conclusions in the presence of functional relations. I describe how to accomplish these modifications in qualitative probabilistic networks by exhibiting the update procedures for graphical transformations involving probabilistic and deterministic variables and combinations. A simple example demonstrates that the augmented scheme can reduce qualitative ambiguity that would arise without the special treatment of functional dependency. Analysis of qualitative synergy reveals that new higher-order relations are required to reason effectively about synergistic interactions among deterministic variables.
Abstract:One important factor determining the computational complexity of evaluating a probabilistic network is the cardinality of the state spaces of the nodes. By varying the granularity of the state spaces, one can trade off accuracy in the result for computational efficiency. We present an anytime procedure for approximate evaluation of probabilistic networks based on this idea. On application to some simple networks, the procedure exhibits a smooth improvement in approximation quality as computation time increases. This suggests that state-space abstraction is one more useful control parameter for designing real-time probabilistic reasoners.
Abstract:To coordinate with other agents in its environment, an agent needs models of what the other agents are trying to do. When communication is impossible or expensive, this information must be acquired indirectly via plan recognition. Typical approaches to plan recognition start with a specification of the possible plans the other agents may be following, and develop special techniques for discriminating among the possibilities. Perhaps more desirable would be a uniform procedure for mapping plans to general structures supporting inference based on uncertain and incomplete observations. In this paper, we describe a set of methods for converting plans represented in a flexible procedural language to observation models represented as probabilistic belief networks.
Abstract:Standard algorithms for finding the shortest path in a graph require that the cost of a path be additive in edge costs, and typically assume that costs are deterministic. We consider the problem of uncertain edge costs, with potential probabilistic dependencies among the costs. Although these dependencies violate the standard dynamic-programming decomposition, we identify a weaker stochastic consistency condition that justifies a generalized dynamic-programming approach based on stochastic dominance. We present a revised path-planning algorithm and prove that it produces optimal paths under time-dependent uncertain costs. We test the algorithm by applying it to a model of stochastic bus networks, and present empirical performance results comparing it to some alternatives. Finally, we consider extensions of these concepts to a more general class of problems of heuristic search under uncertainty.
Abstract:Typical approaches to plan recognition start from a representation of an agent's possible plans, and reason evidentially from observations of the agent's actions to assess the plausibility of the various candidates. A more expansive view of the task (consistent with some prior work) accounts for the context in which the plan was generated, the mental state and planning process of the agent, and consequences of the agent's actions in the world. We present a general Bayesian framework encompassing this view, and focus on how context can be exploited in plan recognition. We demonstrate the approach on a problem in traffic monitoring, where the objective is to induce the plan of the driver from observation of vehicle movements. Starting from a model of how the driver generates plans, we show how the highway context can appropriately influence the recognizer's interpretation of observed driver behavior.