Picture for Felix Hofstätter

Felix Hofstätter

AI Sandbagging: Language Models can Strategically Underperform on Evaluations

Add code
Jun 12, 2024
Viaarxiv icon