Picture for Abhay Sheshadri

Abhay Sheshadri

Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization

Add code
Oct 16, 2024
Viaarxiv icon

Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs

Add code
Jul 22, 2024
Viaarxiv icon

A Mechanistic Analysis of a Transformer Trained on a Symbolic Multi-Step Reasoning Task

Add code
Feb 28, 2024
Viaarxiv icon