Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:A Theory of Emergent In-Context Learning as Implicit Structure Induction

Mar 14, 2023

Michael Hahn, Navin Goyal

Figure 1 for A Theory of Emergent In-Context Learning as Implicit Structure Induction

Figure 2 for A Theory of Emergent In-Context Learning as Implicit Structure Induction

Figure 3 for A Theory of Emergent In-Context Learning as Implicit Structure Induction

Figure 4 for A Theory of Emergent In-Context Learning as Implicit Structure Induction

Share this with someone who'll enjoy it:

Abstract:Scaling large language models (LLMs) leads to an emergent capacity to learn in-context from example demonstrations. Despite progress, theoretical understanding of this phenomenon remains limited. We argue that in-context learning relies on recombination of compositional operations found in natural language data. We derive an information-theoretic bound showing how in-context learning abilities arise from generic next-token prediction when the pretraining distribution has sufficient amounts of compositional structure, under linguistically motivated assumptions. A second bound provides a theoretical justification for the empirical success of prompting LLMs to output intermediate steps towards an answer. To validate theoretical predictions, we introduce a controlled setup for inducing in-context learning; unlike previous approaches, it accounts for the compositional nature of language. Trained transformers can perform in-context learning for a range of tasks, in a manner consistent with the theoretical results. Mirroring real-world LLMs in a miniature setup, in-context learning emerges when scaling parameters and data, and models perform better when prompted to output intermediate steps. Probing shows that in-context learning is supported by a representation of the input's compositional structure. Taken together, these results provide a step towards theoretical understanding of emergent behavior in large language models.

View paper on

Share this with someone who'll enjoy it:

Title:A Theory of Emergent In-Context Learning as Implicit Structure Induction

Paper and Code