Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jean Walrand

An Approximate Dynamic Programming Approach to Repeated Games with Vector Losses

Sep 30, 2018

Vijay Kamble, Patrick Loiseau, Jean Walrand

Figure 1 for An Approximate Dynamic Programming Approach to Repeated Games with Vector Losses

Figure 2 for An Approximate Dynamic Programming Approach to Repeated Games with Vector Losses

Figure 3 for An Approximate Dynamic Programming Approach to Repeated Games with Vector Losses

Figure 4 for An Approximate Dynamic Programming Approach to Repeated Games with Vector Losses

Abstract:We describe an approximate dynamic programming (ADP) approach to compute approximately optimal strategies and approximations of the minimal losses that can be guaranteed in discounted repeated games with vector losses. At the core of our approach is a characterization of the lower Pareto frontier of the set of expected losses that a player can guarantee in these games as the unique fixed point of a set-valued dynamic programming (DP) operator. This fixed point can be approximated by an iterative application of this DP operator compounded by a polytopic set approximation, beginning with a single point. Each iteration can be computed by solving a set of linear programs corresponding to the vertices of the polytope. We derive rigorous bounds on the error of the resulting approximation and the performance of the corresponding approximately optimal strategies. We discuss an application to regret minimization in repeated decision-making in adversarial environments, where we show that this approach can be used to compute approximately optimal strategies and approximations of the minimax optimal regret when the action sets are finite. We illustrate this approach by computing provably approximately optimal strategies for the problem of prediction using expert advice under discounted $\{0,1\}-$losses. Our numerical evaluations demonstrate the sub-optimality of well-known off-the-shelf online learning algorithms like Hedge and a significantly improved performance on using our approximately optimal strategies in these settings. Our work thus demonstrates the significant potential in using the ADP framework to design effective online learning algorithms.

* There was an error in the statement of Proposition 4.2 in the previous version that is fixed in this version

Via

Access Paper or Ask Questions

Parametric Prediction from Parametric Agents

Feb 24, 2016

Yuan Luo, Nihar B. Shah, Jianwei Huang, Jean Walrand

Figure 1 for Parametric Prediction from Parametric Agents

Figure 2 for Parametric Prediction from Parametric Agents

Figure 3 for Parametric Prediction from Parametric Agents

Figure 4 for Parametric Prediction from Parametric Agents

Abstract:We consider a problem of prediction based on opinions elicited from heterogeneous rational agents with private information. Making an accurate prediction with a minimal cost requires a joint design of the incentive mechanism and the prediction algorithm. Such a problem lies at the nexus of statistical learning theory and game theory, and arises in many domains such as consumer surveys and mobile crowdsourcing. In order to elicit heterogeneous agents' private information and incentivize agents with different capabilities to act in the principal's best interest, we design an optimal joint incentive mechanism and prediction algorithm called COPE (COst and Prediction Elicitation), the analysis of which offers several valuable engineering insights. First, when the costs incurred by the agents are linear in the exerted effort, COPE corresponds to a "crowd contending" mechanism, where the principal only employs the agent with the highest capability. Second, when the costs are quadratic, COPE corresponds to a "crowd-sourcing" mechanism that employs multiple agents with different capabilities at the same time. Numerical simulations show that COPE improves the principal's profit and the network profit significantly (larger than 30% in our simulations), comparing to those mechanisms that assume all agents have equal capabilities.

Via

Access Paper or Ask Questions