Picture for Judd Rosenblatt

Judd Rosenblatt

Unexpected Benefits of Self-Modeling in Neural Systems

Add code
Jul 14, 2024
Viaarxiv icon

Rethinking harmless refusals when fine-tuning foundation models

Add code
Jun 27, 2024
Viaarxiv icon