Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kenzo Clauw

Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition

Aug 16, 2024

Kenzo Clauw, Sebastiano Stramaglia, Daniele Marinazzo

Abstract:This paper studies emergent phenomena in neural networks by focusing on grokking where models suddenly generalize after delayed memorization. To understand this phase transition, we utilize higher-order mutual information to analyze the collective behavior (synergy) and shared properties (redundancy) between neurons during training. We identify distinct phases before grokking allowing us to anticipate when it occurs. We attribute grokking to an emergent phase transition caused by the synergistic interactions between neurons as a whole. We show that weight decay and weight initialization can enhance the emergent phase.

* ICML 2024 MI workshop

Via

Access Paper or Ask Questions

Higher-order mutual information reveals synergistic sub-networks for multi-neuron importance

Nov 08, 2022

Kenzo Clauw, Sebastiano Stramaglia, Daniele Marinazzo

Figure 1 for Higher-order mutual information reveals synergistic sub-networks for multi-neuron importance

Figure 2 for Higher-order mutual information reveals synergistic sub-networks for multi-neuron importance

Abstract:Quantifying which neurons are important with respect to the classification decision of a trained neural network is essential for understanding their inner workings. Previous work primarily attributed importance to individual neurons. In this work, we study which groups of neurons contain synergistic or redundant information using a multivariate mutual information method called the O-information. We observe the first layer is dominated by redundancy suggesting general shared features (i.e. detecting edges) while the last layer is dominated by synergy indicating local class-specific features (i.e. concepts). Finally, we show the O-information can be used for multi-neuron importance. This can be demonstrated by re-training a synergistic sub-network, which results in a minimal change in performance. These results suggest our method can be used for pruning and unsupervised representation learning.

* Paper presented at InfoCog @ NeurIPS 2022

Via

Access Paper or Ask Questions