Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Benjamin Lerner

Evolution of SAE Features Across Layers in LLMs

Oct 11, 2024

Daniel Balcells, Benjamin Lerner, Michael Oesterle, Ediz Ucar, Stefan Heimersheim

Figure 1 for Evolution of SAE Features Across Layers in LLMs

Figure 2 for Evolution of SAE Features Across Layers in LLMs

Figure 3 for Evolution of SAE Features Across Layers in LLMs

Figure 4 for Evolution of SAE Features Across Layers in LLMs

Abstract:Sparse Autoencoders for transformer-based language models are typically defined independently per layer. In this work we analyze statistical relationships between features in adjacent layers to understand how features evolve through a forward pass. We provide a graph visualization interface for features and their most similar next-layer neighbors, and build communities of related features across layers. We find that a considerable amount of features are passed through from a previous layer, some features can be expressed as quasi-boolean combinations of previous features, and some features become more specialized in later layers.

Via

Access Paper or Ask Questions