Picture for Kola Ayonrinde

Kola Ayonrinde

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability

Add code
Mar 13, 2025
Viaarxiv icon

Adaptive Sparse Allocation with Mutual Choice & Feature Choice Sparse Autoencoders

Add code
Nov 04, 2024
Viaarxiv icon

Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs

Add code
Oct 15, 2024
Viaarxiv icon