Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits

Apr 09, 2024

Maximilian Dreyer, Erblina Purelku, Johanna Vielhaben, Wojciech Samek, Sebastian Lapuschkin

Figure 1 for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits

Figure 2 for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits

Figure 3 for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits

Figure 4 for PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits

Share this with someone who'll enjoy it:

Abstract:The field of mechanistic interpretability aims to study the role of individual neurons in Deep Neural Networks. Single neurons, however, have the capability to act polysemantically and encode for multiple (unrelated) features, which renders their interpretation difficult. We present a method for disentangling polysemanticity of any Deep Neural Network by decomposing a polysemantic neuron into multiple monosemantic "virtual" neurons. This is achieved by identifying the relevant sub-graph ("circuit") for each "pure" feature. We demonstrate how our approach allows us to find and disentangle various polysemantic units of ResNet models trained on ImageNet. While evaluating feature visualizations using CLIP, our method effectively disentangles representations, improving upon methods based on neuron activations. Our code is available at https://github.com/maxdreyer/PURE.

* 14 pages (4 pages manuscript, 2 pages references, 8 pages appendix)

View paper on

Share this with someone who'll enjoy it:

Title:PURE: Turning Polysemantic Neurons Into Pure Features by Identifying Relevant Circuits

Paper and Code