Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks

Jun 04, 2024

Tianyu He, Darshil Doshi, Aritra Das, Andrey Gromov

Figure 1 for Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks

Figure 2 for Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks

Figure 3 for Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks

Figure 4 for Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks

Share this with someone who'll enjoy it:

Abstract:Large language models can solve tasks that were not present in the training set. This capability is believed to be due to in-context learning and skill composition. In this work, we study the emergence of in-context learning and skill composition in a collection of modular arithmetic tasks. Specifically, we consider a finite collection of linear modular functions $z = a \, x + b \, y \;\mathrm{mod}\; p$ labeled by the vector $(a, b) \in \mathbb{Z}_p^2$. We use some of these tasks for pre-training and the rest for out-of-distribution testing. We empirically show that a GPT-style transformer exhibits a transition from in-distribution to out-of-distribution generalization as the number of pre-training tasks increases. We find that the smallest model capable of out-of-distribution generalization requires two transformer blocks, while for deeper models, the out-of-distribution generalization phase is \emph{transient}, necessitating early stopping. Finally, we perform an interpretability study of the pre-trained models, revealing the highly structured representations in both phases; and discuss the learnt algorithm.

* 21 pages, 19 figures

View paper on

Share this with someone who'll enjoy it:

Title:Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks

Paper and Code