Picture for David Rein

David Rein

HCAST: Human-Calibrated Autonomy Software Tasks

Add code
Mar 21, 2025
Viaarxiv icon

Measuring AI Ability to Complete Long Tasks

Add code
Mar 18, 2025
Viaarxiv icon

Training Language Models to Win Debates with Self-Play Improves Judge Accuracy

Add code
Sep 25, 2024
Viaarxiv icon

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

Add code
Nov 20, 2023
Viaarxiv icon

Debate Helps Supervise Unreliable Experts

Add code
Nov 15, 2023
Viaarxiv icon

Classification with Strategically Withheld Data

Add code
Jan 14, 2021
Figure 1 for Classification with Strategically Withheld Data
Figure 2 for Classification with Strategically Withheld Data
Figure 3 for Classification with Strategically Withheld Data
Figure 4 for Classification with Strategically Withheld Data
Viaarxiv icon