Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mathijs M. de Weerdt

Bayesian Meta-Reinforcement Learning with Laplace Variational Recurrent Networks

May 24, 2025

Joery A. de Vries, Jinke He, Mathijs M. de Weerdt, Matthijs T. J. Spaan

Abstract:Meta-reinforcement learning trains a single reinforcement learning agent on a distribution of tasks to quickly generalize to new tasks outside of the training set at test time. From a Bayesian perspective, one can interpret this as performing amortized variational inference on the posterior distribution over training tasks. Among the various meta-reinforcement learning approaches, a common method is to represent this distribution with a point-estimate using a recurrent neural network. We show how one can augment this point estimate to give full distributions through the Laplace approximation, either at the start of, during, or after learning, without modifying the base model architecture. With our approximation, we are able to estimate distribution statistics (e.g., the entropy) of non-Bayesian agents and observe that point-estimate based methods produce overconfident estimators while not satisfying consistency. Furthermore, when comparing our approach to full-distribution based learning of the task posterior, our method performs on par with variational baselines while having much fewer parameters.

Via

Access Paper or Ask Questions

Optimal Decision Trees for Separable Objectives: Pushing the Limits of Dynamic Programming

May 31, 2023

Jacobus G. M. van der Linden, Mathijs M. de Weerdt, Emir Demirović

Abstract:Global optimization of decision trees has shown to be promising in terms of accuracy, size, and consequently human comprehensibility. However, many of the methods used rely on general-purpose solvers for which scalability remains an issue. Dynamic programming methods have been shown to scale much better because they exploit the tree structure by solving subtrees as independent subproblems. However, this only works when an objective can be optimized separately for subtrees. We explore this relationship in detail and show necessary and sufficient conditions for such separability and generalize previous dynamic programming approaches into a framework that can optimize any combination of separable objectives and constraints. Experiments on four application domains show the general applicability of this framework, while outperforming the scalability of general-purpose solvers by a large margin.

Via

Access Paper or Ask Questions

Solving Transition-Independent Multi-agent MDPs with Sparse Interactions (Extended version)

Feb 11, 2016

Joris Scharpff, Diederik M. Roijers, Frans A. Oliehoek, Matthijs T. J. Spaan, Mathijs M. de Weerdt

Figure 1 for Solving Transition-Independent Multi-agent MDPs with Sparse Interactions (Extended version)

Figure 2 for Solving Transition-Independent Multi-agent MDPs with Sparse Interactions (Extended version)

Figure 3 for Solving Transition-Independent Multi-agent MDPs with Sparse Interactions (Extended version)

Figure 4 for Solving Transition-Independent Multi-agent MDPs with Sparse Interactions (Extended version)

Abstract:In cooperative multi-agent sequential decision making under uncertainty, agents must coordinate to find an optimal joint policy that maximises joint value. Typical algorithms exploit additive structure in the value function, but in the fully-observable multi-agent MDP setting (MMDP) such structure is not present. We propose a new optimal solver for transition-independent MMDPs, in which agents can only affect their own state but their reward depends on joint transitions. We represent these dependencies compactly in conditional return graphs (CRGs). Using CRGs the value of a joint policy and the bounds on partially specified joint policies can be efficiently computed. We propose CoRe, a novel branch-and-bound policy search algorithm building on CRGs. CoRe typically requires less runtime than the available alternatives and finds solutions to problems previously unsolvable.

* This article is an extended version of the paper that was published under the same title in the Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI16), held in Phoenix, Arizona USA on February 12-17, 2016

Via

Access Paper or Ask Questions

Computing All-Pairs Shortest Paths by Leveraging Low Treewidth

Jan 18, 2014

Léon R. Planken, Mathijs M. de Weerdt, Roman P. J. van der Krogt

Figure 1 for Computing All-Pairs Shortest Paths by Leveraging Low Treewidth

Abstract:We present two new and efficient algorithms for computing all-pairs shortest paths. The algorithms operate on directed graphs with real (possibly negative) weights. They make use of directed path consistency along a vertex ordering d. Both algorithms run in O(n^2 w_d) time, where w_d is the graph width induced by this vertex ordering. For graphs of constant treewidth, this yields O(n^2) time, which is optimal. On chordal graphs, the algorithms run in O(nm) time. In addition, we present a variant that exploits graph separators to arrive at a run time of O(n w_d^2 + n^2 s_d) on general graphs, where s_d andlt= w_d is the size of the largest minimal separator induced by the vertex ordering d. We show empirically that on both constructed and realistic benchmarks, in many cases the algorithms outperform Floyd-Warshalls as well as Johnsons algorithm, which represent the current state of the art with a run time of O(n^3) and O(nm + n^2 log n), respectively. Our algorithms can be used for spatial and temporal reasoning, such as for the Simple Temporal Problem, which underlines their relevance to the planning and scheduling community.

* Journal Of Artificial Intelligence Research, Volume 43, pages 353-388, 2012

Via

Access Paper or Ask Questions