Picture for Martín Soto

Martín Soto

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Add code
Feb 25, 2025
Viaarxiv icon

Tell me about yourself: LLMs are aware of their learned behaviors

Add code
Jan 19, 2025
Viaarxiv icon