Picture for Adam Karvonen

Adam Karvonen

Evaluating Sparse Autoencoders on Targeted Concept Erasure Tasks

Add code
Nov 28, 2024
Viaarxiv icon

Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models

Add code
Jul 31, 2024
Figure 1 for Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
Figure 2 for Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
Figure 3 for Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
Figure 4 for Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models
Viaarxiv icon

Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models

Add code
Mar 21, 2024
Viaarxiv icon