Picture for Stefan Heimersheim

Stefan Heimersheim

Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs

Add code
Oct 16, 2024
Viaarxiv icon

Evolution of SAE Features Across Layers in LLMs

Add code
Oct 11, 2024
Viaarxiv icon

Characterizing stable regions in the residual stream of LLMs

Add code
Sep 26, 2024
Viaarxiv icon

Using Degeneracy in the Loss Landscape for Mechanistic Interpretability

Add code
May 17, 2024
Viaarxiv icon

How to use and interpret activation patching

Add code
Apr 23, 2024
Viaarxiv icon

Towards Automated Circuit Discovery for Mechanistic Interpretability

Add code
Apr 28, 2023
Figure 1 for Towards Automated Circuit Discovery for Mechanistic Interpretability
Figure 2 for Towards Automated Circuit Discovery for Mechanistic Interpretability
Figure 3 for Towards Automated Circuit Discovery for Mechanistic Interpretability
Figure 4 for Towards Automated Circuit Discovery for Mechanistic Interpretability
Viaarxiv icon