Picture for Lucius Bushnaq

Lucius Bushnaq

Stochastic Parameter Decomposition

Add code
Jun 25, 2025
Figure 1 for Stochastic Parameter Decomposition
Figure 2 for Stochastic Parameter Decomposition
Figure 3 for Stochastic Parameter Decomposition
Figure 4 for Stochastic Parameter Decomposition
Viaarxiv icon

Identifying Sparsely Active Circuits Through Local Loss Landscape Decomposition

Add code
Mar 31, 2025
Figure 1 for Identifying Sparsely Active Circuits Through Local Loss Landscape Decomposition
Viaarxiv icon

Open Problems in Mechanistic Interpretability

Add code
Jan 27, 2025
Figure 1 for Open Problems in Mechanistic Interpretability
Figure 2 for Open Problems in Mechanistic Interpretability
Figure 3 for Open Problems in Mechanistic Interpretability
Figure 4 for Open Problems in Mechanistic Interpretability
Viaarxiv icon

Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition

Add code
Jan 24, 2025
Figure 1 for Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition
Figure 2 for Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition
Figure 3 for Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition
Figure 4 for Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition
Viaarxiv icon

Towards evaluations-based safety cases for AI scheming

Add code
Nov 07, 2024
Figure 1 for Towards evaluations-based safety cases for AI scheming
Figure 2 for Towards evaluations-based safety cases for AI scheming
Figure 3 for Towards evaluations-based safety cases for AI scheming
Figure 4 for Towards evaluations-based safety cases for AI scheming
Viaarxiv icon

Using Degeneracy in the Loss Landscape for Mechanistic Interpretability

Add code
May 17, 2024
Figure 1 for Using Degeneracy in the Loss Landscape for Mechanistic Interpretability
Viaarxiv icon

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

Add code
May 17, 2024
Figure 1 for The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Figure 2 for The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Figure 3 for The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Figure 4 for The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Viaarxiv icon