Picture for Jean-Charles Noirot Ferrand

Jean-Charles Noirot Ferrand

Targeting Alignment: Extracting Safety Classifiers of Aligned LLMs

Add code
Jan 27, 2025
Viaarxiv icon

The Efficacy of Transformer-based Adversarial Attacks in Security Domains

Add code
Oct 17, 2023
Viaarxiv icon