Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Universal Length Generalization with Turing Programs

Jul 03, 2024

Kaiying Hou, David Brandfonbrener, Sham Kakade, Samy Jelassi, Eran Malach

Figure 1 for Universal Length Generalization with Turing Programs

Figure 2 for Universal Length Generalization with Turing Programs

Figure 3 for Universal Length Generalization with Turing Programs

Figure 4 for Universal Length Generalization with Turing Programs

Share this with someone who'll enjoy it:

Abstract:Length generalization refers to the ability to extrapolate from short training sequences to long test sequences and is a challenge for current large language models. While prior work has proposed some architecture or data format changes to achieve length generalization, these proposals typically apply to a limited set of tasks. Building on prior scratchpad and Chain-of-Thought (CoT) techniques, we propose Turing Programs, a novel CoT strategy that decomposes an algorithmic task into steps mimicking the computation of a Turing Machine. This framework is both universal, as it can accommodate any algorithmic task, and simple, requiring only copying text from the context with small modifications. We show that by using Turing Programs, we obtain robust length generalization on a range of algorithmic tasks: addition, multiplication and in-context SGD. We then demonstrate that transformers achieve length generalization on random Turing Programs, suggesting that length generalization is possible for any algorithmic task. Finally, we theoretically prove that transformers can implement Turing Programs, constructing a simple RASP (Weiss et al.) program that simulates an arbitrary Turing machine.

View paper on

Share this with someone who'll enjoy it:

Title:Universal Length Generalization with Turing Programs

Paper and Code