Picture for Samuel F. Brown

Samuel F. Brown

AI Sandbagging: Language Models can Strategically Underperform on Evaluations

Add code
Jun 12, 2024
Viaarxiv icon