Picture for Tim Rocktäschel

Tim Rocktäschel

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

Add code
Nov 20, 2024
Viaarxiv icon

Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models

Add code
Nov 19, 2024
Viaarxiv icon

TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation

Add code
Oct 04, 2024
Viaarxiv icon

Outliers and Calibration Sets have Diminishing Effect on Quantization of Modern LLMs

Add code
Jun 03, 2024
Viaarxiv icon

Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

Add code
Feb 26, 2024
Figure 1 for Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Figure 2 for Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Figure 3 for Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Figure 4 for Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts
Viaarxiv icon

Genie: Generative Interactive Environments

Add code
Feb 23, 2024
Figure 1 for Genie: Generative Interactive Environments
Figure 2 for Genie: Generative Interactive Environments
Figure 3 for Genie: Generative Interactive Environments
Figure 4 for Genie: Generative Interactive Environments
Viaarxiv icon

Debating with More Persuasive LLMs Leads to More Truthful Answers

Add code
Feb 15, 2024
Viaarxiv icon

Multi-Agent Diagnostics for Robustness via Illuminated Diversity

Add code
Jan 24, 2024
Viaarxiv icon

Leading the Pack: N-player Opponent Shaping

Add code
Dec 26, 2023
Viaarxiv icon

Scaling Opponent Shaping to High Dimensional Games

Add code
Dec 19, 2023
Viaarxiv icon