Picture for Davide Ghilardi

Davide Ghilardi

Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups

Add code
Oct 28, 2024
Figure 1 for Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups
Figure 2 for Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups
Figure 3 for Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups
Figure 4 for Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups
Viaarxiv icon

h4rm3l: A Dynamic Benchmark of Composable Jailbreak Attacks for LLM Safety Assessment

Add code
Aug 09, 2024
Figure 1 for h4rm3l: A Dynamic Benchmark of Composable Jailbreak Attacks for LLM Safety Assessment
Figure 2 for h4rm3l: A Dynamic Benchmark of Composable Jailbreak Attacks for LLM Safety Assessment
Figure 3 for h4rm3l: A Dynamic Benchmark of Composable Jailbreak Attacks for LLM Safety Assessment
Figure 4 for h4rm3l: A Dynamic Benchmark of Composable Jailbreak Attacks for LLM Safety Assessment
Viaarxiv icon