Picture for Neel Nanda

Neel Nanda

Google DeepMind

Are Sparse Autoencoders Useful? A Case Study in Sparse Probing

Add code
Feb 23, 2025
Viaarxiv icon

Sparse Autoencoders Do Not Find Canonical Units of Analysis

Add code
Feb 07, 2025
Figure 1 for Sparse Autoencoders Do Not Find Canonical Units of Analysis
Figure 2 for Sparse Autoencoders Do Not Find Canonical Units of Analysis
Figure 3 for Sparse Autoencoders Do Not Find Canonical Units of Analysis
Figure 4 for Sparse Autoencoders Do Not Find Canonical Units of Analysis
Viaarxiv icon

Open Problems in Mechanistic Interpretability

Add code
Jan 27, 2025
Figure 1 for Open Problems in Mechanistic Interpretability
Figure 2 for Open Problems in Mechanistic Interpretability
Figure 3 for Open Problems in Mechanistic Interpretability
Figure 4 for Open Problems in Mechanistic Interpretability
Viaarxiv icon

BatchTopK Sparse Autoencoders

Add code
Dec 09, 2024
Viaarxiv icon

Evaluating Sparse Autoencoders on Targeted Concept Erasure Tasks

Add code
Nov 28, 2024
Viaarxiv icon

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Add code
Nov 21, 2024
Viaarxiv icon

Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2

Add code
Aug 09, 2024
Figure 1 for Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Figure 2 for Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Figure 3 for Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Figure 4 for Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Viaarxiv icon

Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders

Add code
Jul 19, 2024
Figure 1 for Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders
Figure 2 for Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders
Figure 3 for Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders
Figure 4 for Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders
Viaarxiv icon

Interpreting Attention Layer Outputs with Sparse Autoencoders

Add code
Jun 25, 2024
Viaarxiv icon

Confidence Regulation Neurons in Language Models

Add code
Jun 24, 2024
Viaarxiv icon