Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Acceleration of Grokking in Learning Arithmetic Operations via Kolmogorov-Arnold Representation

May 26, 2024

Yeachan Park, Minseok Kim, Yeoneung Kim

Figure 1 for Acceleration of Grokking in Learning Arithmetic Operations via Kolmogorov-Arnold Representation

Figure 2 for Acceleration of Grokking in Learning Arithmetic Operations via Kolmogorov-Arnold Representation

Figure 3 for Acceleration of Grokking in Learning Arithmetic Operations via Kolmogorov-Arnold Representation

Figure 4 for Acceleration of Grokking in Learning Arithmetic Operations via Kolmogorov-Arnold Representation

Share this with someone who'll enjoy it:

Abstract:We propose novel methodologies aimed at accelerating the grokking phenomenon, which refers to the rapid increment of test accuracy after a long period of overfitting as reported in~\cite{power2022grokking}. Focusing on the grokking phenomenon that arises in learning arithmetic binary operations via the transformer model, we begin with a discussion on data augmentation in the case of commutative binary operations. To further accelerate, we elucidate arithmetic operations through the lens of the Kolmogorov-Arnold (KA) representation theorem, revealing its correspondence to the transformer architecture: embedding, decoder block, and classifier. Observing the shared structure between KA representations associated with binary operations, we suggest various transfer learning mechanisms that expedite grokking. This interpretation is substantiated through a series of rigorous experiments. In addition, our approach is successful in learning two nonstandard arithmetic tasks: composition of operations and a system of equations. Furthermore, we reveal that the model is capable of learning arithmetic operations using a limited number of tokens under embedding transfer, which is supported by a set of experiments as well.

View paper on

Share this with someone who'll enjoy it:

Title:Acceleration of Grokking in Learning Arithmetic Operations via Kolmogorov-Arnold Representation

Paper and Code