Picture for János Kramár

János Kramár

Google DeepMind

Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2

Add code
Aug 09, 2024
Figure 1 for Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Figure 2 for Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Figure 3 for Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Figure 4 for Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Viaarxiv icon

Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders

Add code
Jul 19, 2024
Figure 1 for Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders
Figure 2 for Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders
Figure 3 for Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders
Figure 4 for Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders
Viaarxiv icon

On scalable oversight with weak LLMs judging strong LLMs

Add code
Jul 05, 2024
Viaarxiv icon

Improving Dictionary Learning with Gated Sparse Autoencoders

Add code
Apr 30, 2024
Viaarxiv icon

AtP*: An efficient and scalable method for localizing LLM behaviour to components

Add code
Mar 01, 2024
Viaarxiv icon

Explaining grokking through circuit efficiency

Add code
Sep 05, 2023
Viaarxiv icon

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

Add code
Jul 24, 2023
Figure 1 for Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Figure 2 for Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Figure 3 for Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Figure 4 for Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla
Viaarxiv icon

Tracr: Compiled Transformers as a Laboratory for Interpretability

Add code
Jan 12, 2023
Viaarxiv icon

Learning to Play No-Press Diplomacy with Best Response Policy Iteration

Add code
Jun 17, 2020
Figure 1 for Learning to Play No-Press Diplomacy with Best Response Policy Iteration
Figure 2 for Learning to Play No-Press Diplomacy with Best Response Policy Iteration
Figure 3 for Learning to Play No-Press Diplomacy with Best Response Policy Iteration
Figure 4 for Learning to Play No-Press Diplomacy with Best Response Policy Iteration
Viaarxiv icon

OpenSpiel: A Framework for Reinforcement Learning in Games

Add code
Oct 10, 2019
Figure 1 for OpenSpiel: A Framework for Reinforcement Learning in Games
Figure 2 for OpenSpiel: A Framework for Reinforcement Learning in Games
Figure 3 for OpenSpiel: A Framework for Reinforcement Learning in Games
Figure 4 for OpenSpiel: A Framework for Reinforcement Learning in Games
Viaarxiv icon