Picture for Wendelin Böhmer

Wendelin Böhmer

Parallelizing Tree Search with Twice Sequential Monte Carlo

Add code
Nov 18, 2025
Viaarxiv icon

Improving Robustness of AlphaZero Algorithms to Test-Time Environment Changes

Add code
Sep 04, 2025
Viaarxiv icon

Modular Recurrence in Contextual MDPs for Universal Morphology Control

Add code
Jun 10, 2025
Viaarxiv icon

Universal Value-Function Uncertainties

Add code
May 27, 2025
Figure 1 for Universal Value-Function Uncertainties
Figure 2 for Universal Value-Function Uncertainties
Figure 3 for Universal Value-Function Uncertainties
Figure 4 for Universal Value-Function Uncertainties
Viaarxiv icon

How Ensembles of Distilled Policies Improve Generalisation in Reinforcement Learning

Add code
May 22, 2025
Viaarxiv icon

Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model

Add code
Mar 14, 2025
Viaarxiv icon

Training on more Reachable Tasks for Generalisation in Reinforcement Learning

Add code
Oct 04, 2024
Figure 1 for Training on more Reachable Tasks for Generalisation in Reinforcement Learning
Figure 2 for Training on more Reachable Tasks for Generalisation in Reinforcement Learning
Figure 3 for Training on more Reachable Tasks for Generalisation in Reinforcement Learning
Figure 4 for Training on more Reachable Tasks for Generalisation in Reinforcement Learning
Viaarxiv icon

Explore-Go: Leveraging Exploration for Generalisation in Deep Reinforcement Learning

Add code
Jun 12, 2024
Figure 1 for Explore-Go: Leveraging Exploration for Generalisation in Deep Reinforcement Learning
Figure 2 for Explore-Go: Leveraging Exploration for Generalisation in Deep Reinforcement Learning
Figure 3 for Explore-Go: Leveraging Exploration for Generalisation in Deep Reinforcement Learning
Figure 4 for Explore-Go: Leveraging Exploration for Generalisation in Deep Reinforcement Learning
Viaarxiv icon

A Penalty-Based Guardrail Algorithm for Non-Decreasing Optimization with Inequality Constraints

Add code
May 03, 2024
Figure 1 for A Penalty-Based Guardrail Algorithm for Non-Decreasing Optimization with Inequality Constraints
Figure 2 for A Penalty-Based Guardrail Algorithm for Non-Decreasing Optimization with Inequality Constraints
Figure 3 for A Penalty-Based Guardrail Algorithm for Non-Decreasing Optimization with Inequality Constraints
Figure 4 for A Penalty-Based Guardrail Algorithm for Non-Decreasing Optimization with Inequality Constraints
Viaarxiv icon

To the Max: Reinventing Reward in Reinforcement Learning

Add code
Feb 02, 2024
Figure 1 for To the Max: Reinventing Reward in Reinforcement Learning
Figure 2 for To the Max: Reinventing Reward in Reinforcement Learning
Figure 3 for To the Max: Reinventing Reward in Reinforcement Learning
Figure 4 for To the Max: Reinventing Reward in Reinforcement Learning
Viaarxiv icon