Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Towards Understanding Grokking: An Effective Theory of Representation Learning

May 20, 2022

Ziming Liu, Ouail Kitouni, Niklas Nolte, Eric J. Michaud, Max Tegmark, Mike Williams

Figure 1 for Towards Understanding Grokking: An Effective Theory of Representation Learning

Figure 2 for Towards Understanding Grokking: An Effective Theory of Representation Learning

Figure 3 for Towards Understanding Grokking: An Effective Theory of Representation Learning

Figure 4 for Towards Understanding Grokking: An Effective Theory of Representation Learning

Share this with someone who'll enjoy it:

Abstract:We aim to understand grokking, a phenomenon where models generalize long after overfitting their training set. We present both a microscopic analysis anchored by an effective theory and a macroscopic analysis of phase diagrams describing learning performance across hyperparameters. We find that generalization originates from structured representations whose training dynamics and dependence on training set size can be predicted by our effective theory in a toy setting. We observe empirically the presence of four learning phases: comprehension, grokking, memorization, and confusion. We find representation learning to occur only in a "Goldilocks zone" (including comprehension and grokking) between memorization and confusion. Compared to the comprehension phase, the grokking phase stays closer to the memorization phase, leading to delayed generalization. The Goldilocks phase is reminiscent of "intelligence from starvation" in Darwinian evolution, where resource limitations drive discovery of more efficient solutions. This study not only provides intuitive explanations of the origin of grokking, but also highlights the usefulness of physics-inspired tools, e.g., effective theories and phase diagrams, for understanding deep learning.

* 20 pages, 16 figures

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Towards Understanding Grokking: An Effective Theory of Representation Learning

Paper and Code