Picture for Joseph Miller

Joseph Miller

Open Problems in Mechanistic Interpretability

Add code
Jan 27, 2025
Figure 1 for Open Problems in Mechanistic Interpretability
Figure 2 for Open Problems in Mechanistic Interpretability
Figure 3 for Open Problems in Mechanistic Interpretability
Figure 4 for Open Problems in Mechanistic Interpretability
Viaarxiv icon

Gradient Routing: Masking Gradients to Localize Computation in Neural Networks

Add code
Oct 06, 2024
Viaarxiv icon

Transformer Circuit Faithfulness Metrics are not Robust

Add code
Jul 11, 2024
Viaarxiv icon

Adversarial Policies Beat Professional-Level Go AIs

Add code
Nov 01, 2022
Viaarxiv icon