Picture for Aengus Lynch

Aengus Lynch

Best-of-N Jailbreaking

Add code
Dec 04, 2024
Viaarxiv icon

Targeted Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs

Add code
Jul 22, 2024
Viaarxiv icon

Analyzing the Generalization and Reliability of Steering Vectors -- ICML 2024

Add code
Jul 17, 2024
Viaarxiv icon

Eight Methods to Evaluate Robust Unlearning in LLMs

Add code
Feb 26, 2024
Viaarxiv icon

Towards Automated Circuit Discovery for Mechanistic Interpretability

Add code
Apr 28, 2023
Figure 1 for Towards Automated Circuit Discovery for Mechanistic Interpretability
Figure 2 for Towards Automated Circuit Discovery for Mechanistic Interpretability
Figure 3 for Towards Automated Circuit Discovery for Mechanistic Interpretability
Figure 4 for Towards Automated Circuit Discovery for Mechanistic Interpretability
Viaarxiv icon

Spawrious: A Benchmark for Fine Control of Spurious Correlation Biases

Add code
Mar 09, 2023
Viaarxiv icon

Causal Machine Learning: A Survey and Open Problems

Add code
Jun 30, 2022
Figure 1 for Causal Machine Learning: A Survey and Open Problems
Figure 2 for Causal Machine Learning: A Survey and Open Problems
Figure 3 for Causal Machine Learning: A Survey and Open Problems
Figure 4 for Causal Machine Learning: A Survey and Open Problems
Viaarxiv icon