Picture for Maximilian Li

Maximilian Li

Endless Jailbreaks with Bijection Learning

Add code
Oct 02, 2024
Viaarxiv icon

Optimal ablation for interpretability

Add code
Sep 16, 2024
Viaarxiv icon

Circuit Breaking: Removing Model Behaviors with Targeted Ablation

Add code
Sep 12, 2023
Viaarxiv icon

Goal-Conditioned Imitation Learning using Score-based Diffusion Policies

Add code
Apr 05, 2023
Viaarxiv icon