Picture for Tim Belonax

Tim Belonax

Auditing language models for hidden objectives

Add code
Mar 14, 2025
Viaarxiv icon

Alignment faking in large language models

Add code
Dec 18, 2024
Viaarxiv icon