Picture for Atticus Geiger

Atticus Geiger

Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small

Add code
Sep 05, 2024
Viaarxiv icon

Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations

Add code
Aug 20, 2024
Figure 1 for Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Figure 2 for Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Figure 3 for Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Figure 4 for Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations
Viaarxiv icon

Updating CLIP to Prefer Descriptions Over Captions

Add code
Jun 12, 2024
Viaarxiv icon

ReFT: Representation Finetuning for Language Models

Add code
Apr 08, 2024
Viaarxiv icon

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

Add code
Mar 12, 2024
Viaarxiv icon

RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations

Add code
Feb 27, 2024
Figure 1 for RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
Figure 2 for RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
Figure 3 for RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
Figure 4 for RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations
Viaarxiv icon

A Reply to Makelov et al. 's "Interpretability Illusion" Arguments

Add code
Jan 23, 2024
Viaarxiv icon

Linear Representations of Sentiment in Large Language Models

Add code
Oct 23, 2023
Viaarxiv icon

Rigorously Assessing Natural Language Explanations of Neurons

Add code
Sep 19, 2023
Viaarxiv icon

ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning

Add code
May 30, 2023
Viaarxiv icon