Picture for Tom Lieberum

Tom Lieberum

Google DeepMind

Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2

Add code
Aug 09, 2024
Figure 1 for Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Figure 2 for Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Figure 3 for Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Figure 4 for Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Viaarxiv icon

Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders

Add code
Jul 19, 2024
Figure 1 for Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders
Figure 2 for Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders
Figure 3 for Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders
Figure 4 for Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders
Viaarxiv icon

Improving Dictionary Learning with Gated Sparse Autoencoders

Add code
Apr 30, 2024
Viaarxiv icon

Evaluating Frontier Models for Dangerous Capabilities

Add code
Mar 20, 2024
Figure 1 for Evaluating Frontier Models for Dangerous Capabilities
Figure 2 for Evaluating Frontier Models for Dangerous Capabilities
Figure 3 for Evaluating Frontier Models for Dangerous Capabilities
Figure 4 for Evaluating Frontier Models for Dangerous Capabilities
Viaarxiv icon

AtP*: An efficient and scalable method for localizing LLM behaviour to components

Add code
Mar 01, 2024
Viaarxiv icon

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

Add code
Jul 24, 2023
Figure 1 for Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Figure 2 for Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Figure 3 for Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Figure 4 for Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Viaarxiv icon

Progress measures for grokking via mechanistic interpretability

Add code
Jan 13, 2023
Viaarxiv icon

Retrospective on the 2021 BASALT Competition on Learning from Human Feedback

Add code
Apr 14, 2022
Figure 1 for Retrospective on the 2021 BASALT Competition on Learning from Human Feedback
Figure 2 for Retrospective on the 2021 BASALT Competition on Learning from Human Feedback
Figure 3 for Retrospective on the 2021 BASALT Competition on Learning from Human Feedback
Figure 4 for Retrospective on the 2021 BASALT Competition on Learning from Human Feedback
Viaarxiv icon