Picture for Gonçalo Paulo

Gonçalo Paulo

When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research

Add code
May 17, 2025
Viaarxiv icon

Partially Rewriting a Transformer in Natural Language

Add code
Jan 31, 2025
Figure 1 for Partially Rewriting a Transformer in Natural Language
Figure 2 for Partially Rewriting a Transformer in Natural Language
Figure 3 for Partially Rewriting a Transformer in Natural Language
Figure 4 for Partially Rewriting a Transformer in Natural Language
Viaarxiv icon

Transcoders Beat Sparse Autoencoders for Interpretability

Add code
Jan 31, 2025
Figure 1 for Transcoders Beat Sparse Autoencoders for Interpretability
Figure 2 for Transcoders Beat Sparse Autoencoders for Interpretability
Figure 3 for Transcoders Beat Sparse Autoencoders for Interpretability
Figure 4 for Transcoders Beat Sparse Autoencoders for Interpretability
Viaarxiv icon

Sparse Autoencoders Trained on the Same Data Learn Different Features

Add code
Jan 29, 2025
Figure 1 for Sparse Autoencoders Trained on the Same Data Learn Different Features
Figure 2 for Sparse Autoencoders Trained on the Same Data Learn Different Features
Figure 3 for Sparse Autoencoders Trained on the Same Data Learn Different Features
Figure 4 for Sparse Autoencoders Trained on the Same Data Learn Different Features
Viaarxiv icon

Automatically Interpreting Millions of Features in Large Language Models

Add code
Oct 17, 2024
Figure 1 for Automatically Interpreting Millions of Features in Large Language Models
Figure 2 for Automatically Interpreting Millions of Features in Large Language Models
Figure 3 for Automatically Interpreting Millions of Features in Large Language Models
Figure 4 for Automatically Interpreting Millions of Features in Large Language Models
Viaarxiv icon

Does Transformer Interpretability Transfer to RNNs?

Add code
Apr 09, 2024
Figure 1 for Does Transformer Interpretability Transfer to RNNs?
Figure 2 for Does Transformer Interpretability Transfer to RNNs?
Figure 3 for Does Transformer Interpretability Transfer to RNNs?
Figure 4 for Does Transformer Interpretability Transfer to RNNs?
Viaarxiv icon