Picture for Vikrant Varma

Vikrant Varma

Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2

Add code
Aug 09, 2024
Figure 1 for Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Figure 2 for Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Figure 3 for Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Figure 4 for Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
Viaarxiv icon

Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders

Add code
Jul 19, 2024
Figure 1 for Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders
Figure 2 for Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders
Figure 3 for Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders
Figure 4 for Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders
Viaarxiv icon

Improving Dictionary Learning with Gated Sparse Autoencoders

Add code
Apr 30, 2024
Viaarxiv icon

Challenges with unsupervised LLM knowledge discovery

Add code
Dec 18, 2023
Figure 1 for Challenges with unsupervised LLM knowledge discovery
Figure 2 for Challenges with unsupervised LLM knowledge discovery
Figure 3 for Challenges with unsupervised LLM knowledge discovery
Figure 4 for Challenges with unsupervised LLM knowledge discovery
Viaarxiv icon

Explaining grokking through circuit efficiency

Add code
Sep 05, 2023
Viaarxiv icon

Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals

Add code
Oct 04, 2022
Figure 1 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Figure 2 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Figure 3 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Figure 4 for Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
Viaarxiv icon

Safe Deep RL in 3D Environments using Human Feedback

Add code
Jan 21, 2022
Figure 1 for Safe Deep RL in 3D Environments using Human Feedback
Figure 2 for Safe Deep RL in 3D Environments using Human Feedback
Figure 3 for Safe Deep RL in 3D Environments using Human Feedback
Figure 4 for Safe Deep RL in 3D Environments using Human Feedback
Viaarxiv icon

Imitating Interactive Intelligence

Add code
Jan 21, 2021
Figure 1 for Imitating Interactive Intelligence
Figure 2 for Imitating Interactive Intelligence
Figure 3 for Imitating Interactive Intelligence
Figure 4 for Imitating Interactive Intelligence
Viaarxiv icon