Picture for Sam Marks

Sam Marks

Alignment faking in large language models

Add code
Dec 18, 2024
Viaarxiv icon