Picture for Ryan S. Kwon

Ryan S. Kwon

Representation Bending for Large Language Model Safety

Add code
Apr 02, 2025
Viaarxiv icon