Picture for Davide Ghilardi

Davide Ghilardi

Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups

Add code
Oct 28, 2024
Viaarxiv icon

h4rm3l: A Dynamic Benchmark of Composable Jailbreak Attacks for LLM Safety Assessment

Add code
Aug 09, 2024
Viaarxiv icon