Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:An Explanation of In-context Learning as Implicit Bayesian Inference

Nov 14, 2021

Sang Michael Xie, Aditi Raghunathan, Percy Liang, Tengyu Ma

Figure 1 for An Explanation of In-context Learning as Implicit Bayesian Inference

Figure 2 for An Explanation of In-context Learning as Implicit Bayesian Inference

Figure 3 for An Explanation of In-context Learning as Implicit Bayesian Inference

Figure 4 for An Explanation of In-context Learning as Implicit Bayesian Inference

Share this with someone who'll enjoy it:

Abstract:Large pretrained language models such as GPT-3 have the surprising ability to do in-context learning, where the model learns to do a downstream task simply by conditioning on a prompt consisting of input-output examples. Without being explicitly pretrained to do so, the language model learns from these examples during its forward pass without parameter updates on "out-of-distribution" prompts. Thus, it is unclear what mechanism enables in-context learning. In this paper, we study the role of the pretraining distribution on the emergence of in-context learning under a mathematical setting where the pretraining texts have long-range coherence. Here, language model pretraining requires inferring a latent document-level concept from the conditioning text to generate coherent next tokens. At test time, this mechanism enables in-context learning by inferring the shared latent concept between prompt examples and applying it to make a prediction on the test example. Concretely, we prove that in-context learning occurs implicitly via Bayesian inference of the latent concept when the pretraining distribution is a mixture of HMMs. This can occur despite the distribution mismatch between prompts and pretraining data. In contrast to messy large-scale pretraining datasets for in-context learning in natural language, we generate a family of small-scale synthetic datasets (GINC) where Transformer and LSTM language models both exhibit in-context learning. Beyond the theory which focuses on the effect of the pretraining distribution, we empirically find that scaling model size improves in-context accuracy even when the pretraining loss is the same.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:An Explanation of In-context Learning as Implicit Bayesian Inference

Paper and Code