Picture for Tuomas Sandholm

Tuomas Sandholm

Barriers to Welfare Maximization with No-Regret Learning

Add code
Nov 04, 2024
Viaarxiv icon

Computational Lower Bounds for Regret Minimization in Normal-Form Games

Add code
Nov 04, 2024
Viaarxiv icon

Faster Optimal Coalition Structure Generation via Offline Coalition Selection and Graph-Based Search

Add code
Jul 22, 2024
Viaarxiv icon

Imperfect-Recall Games: Equilibrium Concepts and Their Complexity

Add code
Jun 23, 2024
Viaarxiv icon

AlphaZeroES: Direct score maximization outperforms planning loss minimization

Add code
Jun 12, 2024
Viaarxiv icon

Scalable Mechanism Design for Multi-Agent Path Finding

Add code
Jan 30, 2024
Viaarxiv icon

Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property

Add code
Dec 21, 2023
Viaarxiv icon

Confronting Reward Model Overoptimization with Constrained RLHF

Add code
Oct 10, 2023
Figure 1 for Confronting Reward Model Overoptimization with Constrained RLHF
Figure 2 for Confronting Reward Model Overoptimization with Constrained RLHF
Figure 3 for Confronting Reward Model Overoptimization with Constrained RLHF
Figure 4 for Confronting Reward Model Overoptimization with Constrained RLHF
Viaarxiv icon

Planning in the imagination: High-level planning on learned abstract search spaces

Add code
Aug 16, 2023
Viaarxiv icon

Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations

Add code
Jul 22, 2023
Viaarxiv icon