Picture for Atticus Geiger

Atticus Geiger

Enhancing Automated Interpretability with Output-Centric Feature Descriptions

Add code
Jan 14, 2025
Viaarxiv icon

Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small

Add code
Sep 05, 2024
Viaarxiv icon

Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations

Add code
Aug 20, 2024
Figure 1 for Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Figure 2 for Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Figure 3 for Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Figure 4 for Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Viaarxiv icon

Updating CLIP to Prefer Descriptions Over Captions

Add code
Jun 12, 2024
Viaarxiv icon

ReFT: Representation Finetuning for Language Models

Add code
Apr 08, 2024
Viaarxiv icon

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

Add code
Mar 12, 2024
Viaarxiv icon

RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations

Add code
Feb 27, 2024
Figure 1 for RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
Figure 2 for RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
Figure 3 for RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
Figure 4 for RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
Viaarxiv icon

A Reply to Makelov et al. 's "Interpretability Illusion" Arguments

Add code
Jan 23, 2024
Viaarxiv icon

Linear Representations of Sentiment in Large Language Models

Add code
Oct 23, 2023
Viaarxiv icon

Rigorously Assessing Natural Language Explanations of Neurons

Add code
Sep 19, 2023
Viaarxiv icon