Picture for Thomas Winninger

Thomas Winninger

Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models

Add code
Mar 08, 2025
Viaarxiv icon