Picture for Gonçalo Paulo

Gonçalo Paulo

Transcoders Beat Sparse Autoencoders for Interpretability

Add code
Jan 31, 2025
Viaarxiv icon

Partially Rewriting a Transformer in Natural Language

Add code
Jan 31, 2025
Viaarxiv icon

Sparse Autoencoders Trained on the Same Data Learn Different Features

Add code
Jan 29, 2025
Viaarxiv icon

Automatically Interpreting Millions of Features in Large Language Models

Add code
Oct 17, 2024
Viaarxiv icon

Does Transformer Interpretability Transfer to RNNs?

Add code
Apr 09, 2024
Viaarxiv icon