Picture for Timon Willi

Timon Willi

Rethinking Rubric Generation for Improving LLM Judge and Reward Modeling for Open-ended Tasks

Add code
Feb 04, 2026
Viaarxiv icon

The Llama 4 Herd: Architecture, Training, Evaluation, and Deployment Notes

Add code
Jan 15, 2026
Viaarxiv icon

Training AI Co-Scientists Using Rubric Rewards

Add code
Dec 29, 2025
Viaarxiv icon

Balanced Accuracy: The Right Metric for Evaluating LLM Judges -- Explained through Youden's J statistic

Add code
Dec 08, 2025
Figure 1 for Balanced Accuracy: The Right Metric for Evaluating LLM Judges -- Explained through Youden's J statistic
Figure 2 for Balanced Accuracy: The Right Metric for Evaluating LLM Judges -- Explained through Youden's J statistic
Figure 3 for Balanced Accuracy: The Right Metric for Evaluating LLM Judges -- Explained through Youden's J statistic
Figure 4 for Balanced Accuracy: The Right Metric for Evaluating LLM Judges -- Explained through Youden's J statistic
Viaarxiv icon

The Decrypto Benchmark for Multi-Agent Reasoning and Theory of Mind

Add code
Jun 25, 2025
Viaarxiv icon

No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery

Add code
Aug 27, 2024
Figure 1 for No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery
Figure 2 for No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery
Figure 3 for No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery
Figure 4 for No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery
Viaarxiv icon

Mixture of Experts in a Mixture of RL settings

Add code
Jun 26, 2024
Viaarxiv icon

Mixtures of Experts Unlock Parameter Scaling for Deep RL

Add code
Feb 13, 2024
Figure 1 for Mixtures of Experts Unlock Parameter Scaling for Deep RL
Figure 2 for Mixtures of Experts Unlock Parameter Scaling for Deep RL
Figure 3 for Mixtures of Experts Unlock Parameter Scaling for Deep RL
Figure 4 for Mixtures of Experts Unlock Parameter Scaling for Deep RL
Viaarxiv icon

Analysing the Sample Complexity of Opponent Shaping

Add code
Feb 08, 2024
Figure 1 for Analysing the Sample Complexity of Opponent Shaping
Figure 2 for Analysing the Sample Complexity of Opponent Shaping
Figure 3 for Analysing the Sample Complexity of Opponent Shaping
Figure 4 for Analysing the Sample Complexity of Opponent Shaping
Viaarxiv icon

Leading the Pack: N-player Opponent Shaping

Add code
Dec 26, 2023
Figure 1 for Leading the Pack: N-player Opponent Shaping
Figure 2 for Leading the Pack: N-player Opponent Shaping
Figure 3 for Leading the Pack: N-player Opponent Shaping
Figure 4 for Leading the Pack: N-player Opponent Shaping
Viaarxiv icon