Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jilles S. Dibangoye

Optimally Solving Simultaneous-Move Dec-POMDPs: The Sequential Central Planning Approach

Aug 23, 2024

Johan Peralez, Aurélien Delage, Jacopo Castellini, Rafael F. Cunha, Jilles S. Dibangoye

Figure 1 for Optimally Solving Simultaneous-Move Dec-POMDPs: The Sequential Central Planning Approach

Figure 2 for Optimally Solving Simultaneous-Move Dec-POMDPs: The Sequential Central Planning Approach

Figure 3 for Optimally Solving Simultaneous-Move Dec-POMDPs: The Sequential Central Planning Approach

Figure 4 for Optimally Solving Simultaneous-Move Dec-POMDPs: The Sequential Central Planning Approach

Abstract:Centralized training for decentralized execution paradigm emerged as the state-of-the-art approach to epsilon-optimally solving decentralized partially observable Markov decision processes. However, scalability remains a significant issue. This paper presents a novel and more scalable alternative, namely sequential-move centralized training for decentralized execution. This paradigm further pushes the applicability of Bellman's principle of optimality, raising three new properties. First, it allows a central planner to reason upon sufficient sequential-move statistics instead of prior simultaneous-move ones. Next, it proves that epsilon-optimal value functions are piecewise linear and convex in sufficient sequential-move statistics. Finally, it drops the complexity of the backup operators from double exponential to polynomial at the expense of longer planning horizons. Besides, it makes it easy to use single-agent methods, e.g., SARSA algorithm enhanced with these findings applies while still preserving convergence guarantees. Experiments on two- as well as many-agent domains from the literature against epsilon-optimal simultaneous-move solvers confirm the superiority of the novel approach. This paradigm opens the door for efficient planning and reinforcement learning methods for multi-agent systems.

Via

Access Paper or Ask Questions

Solving Hierarchical Information-Sharing Dec-POMDPs: An Extensive-Form Game Approach

Feb 09, 2024

Johan Peralez, Aurélien Delage, Olivier Buffet, Jilles S. Dibangoye

Figure 1 for Solving Hierarchical Information-Sharing Dec-POMDPs: An Extensive-Form Game Approach

Figure 2 for Solving Hierarchical Information-Sharing Dec-POMDPs: An Extensive-Form Game Approach

Figure 3 for Solving Hierarchical Information-Sharing Dec-POMDPs: An Extensive-Form Game Approach

Figure 4 for Solving Hierarchical Information-Sharing Dec-POMDPs: An Extensive-Form Game Approach

Abstract:A recent theory shows that a multi-player decentralized partially observable Markov decision process can be transformed into an equivalent single-player game, enabling the application of \citeauthor{bellman}'s principle of optimality to solve the single-player game by breaking it down into single-stage subgames. However, this approach entangles the decision variables of all players at each single-stage subgame, resulting in backups with a double-exponential complexity. This paper demonstrates how to disentangle these decision variables while maintaining optimality under hierarchical information sharing, a prominent management style in our society. To achieve this, we apply the principle of optimality to solve any single-stage subgame by breaking it down further into smaller subgames, enabling us to make single-player decisions at a time. Our approach reveals that extensive-form games always exist with solutions to a single-stage subgame, significantly reducing time complexity. Our experimental results show that the algorithms leveraging these findings can scale up to much larger multi-player games without compromising optimality.

Via

Access Paper or Ask Questions

HSVI can solve zero-sum Partially Observable Stochastic Games

Oct 26, 2022

Aurélien Delage, Olivier Buffet, Jilles S. Dibangoye, Abdallah Saffidine

Figure 1 for HSVI can solve zero-sum Partially Observable Stochastic Games

Figure 2 for HSVI can solve zero-sum Partially Observable Stochastic Games

Figure 3 for HSVI can solve zero-sum Partially Observable Stochastic Games

Figure 4 for HSVI can solve zero-sum Partially Observable Stochastic Games

Abstract:State-of-the-art methods for solving 2-player zero-sum imperfect information games rely on linear programming or regret minimization, though not on dynamic programming (DP) or heuristic search (HS), while the latter are often at the core of state-of-the-art solvers for other sequential decision-making problems. In partially observable or collaborative settings (e.g., POMDPs and Dec- POMDPs), DP and HS require introducing an appropriate statistic that induces a fully observable problem as well as bounding (convex) approximators of the optimal value function. This approach has succeeded in some subclasses of 2-player zero-sum partially observable stochastic games (zs- POSGs) as well, but how to apply it in the general case still remains an open question. We answer it by (i) rigorously defining an equivalent game to work with, (ii) proving mathematical properties of the optimal value function that allow deriving bounds that come with solution strategies, (iii) proposing for the first time an HSVI-like solver that provably converges to an $\epsilon$-optimal solution in finite time, and (iv) empirically analyzing it. This opens the door to a novel family of promising approaches complementing those relying on linear programming or iterative methods.

* 42 pages, 2 algorithms. arXiv admin note: substantial text overlap with arXiv:2110.14529

Via

Access Paper or Ask Questions

Scaling Up Decentralized MDPs Through Heuristic Search

Oct 16, 2012

Jilles S. Dibangoye, Christopher Amato, Arnoud Doniec

Figure 1 for Scaling Up Decentralized MDPs Through Heuristic Search

Figure 2 for Scaling Up Decentralized MDPs Through Heuristic Search

Figure 3 for Scaling Up Decentralized MDPs Through Heuristic Search

Abstract:Decentralized partially observable Markov decision processes (Dec-POMDPs) are rich models for cooperative decision-making under uncertainty, but are often intractable to solve optimally (NEXP-complete). The transition and observation independent Dec-MDP is a general subclass that has been shown to have complexity in NP, but optimal algorithms for this subclass are still inefficient in practice. In this paper, we first provide an updated proof that an optimal policy does not depend on the histories of the agents, but only the local observations. We then present a new algorithm based on heuristic search that is able to expand search nodes by using constraint optimization. We show experimental results comparing our approach with the state-of-the-art DecMDP and Dec-POMDP solvers. These results show a reduction in computation time and an increase in scalability by multiple orders of magnitude in a number of benchmarks.

* Appears in Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (UAI2012)

Via

Access Paper or Ask Questions