Picture for Dan Braun

Dan Braun

Stochastic Parameter Decomposition

Add code
Jun 25, 2025
Figure 1 for Stochastic Parameter Decomposition
Figure 2 for Stochastic Parameter Decomposition
Figure 3 for Stochastic Parameter Decomposition
Figure 4 for Stochastic Parameter Decomposition
Viaarxiv icon

Parameterized Synthetic Text Generation with SimpleStories

Add code
Apr 12, 2025
Viaarxiv icon

Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition

Add code
Jan 24, 2025
Figure 1 for Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition
Figure 2 for Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition
Figure 3 for Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition
Figure 4 for Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition
Viaarxiv icon

Towards evaluations-based safety cases for AI scheming

Add code
Nov 07, 2024
Figure 1 for Towards evaluations-based safety cases for AI scheming
Figure 2 for Towards evaluations-based safety cases for AI scheming
Figure 3 for Towards evaluations-based safety cases for AI scheming
Figure 4 for Towards evaluations-based safety cases for AI scheming
Viaarxiv icon

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

Add code
May 17, 2024
Figure 1 for The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Figure 2 for The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Figure 3 for The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Figure 4 for The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Viaarxiv icon

Using Degeneracy in the Loss Landscape for Mechanistic Interpretability

Add code
May 17, 2024
Figure 1 for Using Degeneracy in the Loss Landscape for Mechanistic Interpretability
Viaarxiv icon

Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning

Add code
May 17, 2024
Figure 1 for Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
Figure 2 for Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
Figure 3 for Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
Figure 4 for Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
Viaarxiv icon

Interpreting Neural Networks through the Polytope Lens

Add code
Nov 22, 2022
Figure 1 for Interpreting Neural Networks through the Polytope Lens
Figure 2 for Interpreting Neural Networks through the Polytope Lens
Figure 3 for Interpreting Neural Networks through the Polytope Lens
Figure 4 for Interpreting Neural Networks through the Polytope Lens
Viaarxiv icon