Picture for Atticus Geiger

Atticus Geiger

Combining Causal Models for More Accurate Abstractions of Neural Networks

Add code
Mar 14, 2025
Viaarxiv icon

HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks

Add code
Mar 13, 2025
Viaarxiv icon

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

Add code
Jan 29, 2025
Viaarxiv icon

Open Problems in Mechanistic Interpretability

Add code
Jan 27, 2025
Figure 1 for Open Problems in Mechanistic Interpretability
Figure 2 for Open Problems in Mechanistic Interpretability
Figure 3 for Open Problems in Mechanistic Interpretability
Figure 4 for Open Problems in Mechanistic Interpretability
Viaarxiv icon

Enhancing Automated Interpretability with Output-Centric Feature Descriptions

Add code
Jan 14, 2025
Viaarxiv icon

Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small

Add code
Sep 05, 2024
Viaarxiv icon

Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations

Add code
Aug 20, 2024
Figure 1 for Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Figure 2 for Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Figure 3 for Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Figure 4 for Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Viaarxiv icon

Updating CLIP to Prefer Descriptions Over Captions

Add code
Jun 12, 2024
Viaarxiv icon

ReFT: Representation Finetuning for Language Models

Add code
Apr 08, 2024
Viaarxiv icon

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

Add code
Mar 12, 2024
Figure 1 for pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
Figure 2 for pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
Figure 3 for pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
Viaarxiv icon