Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Towards Revealing the Mystery behind Chain of Thought: a Theoretical Perspective

May 24, 2023

Guhao Feng, Yuntian Gu, Bohang Zhang, Haotian Ye, Di He, Liwei Wang

Figure 1 for Towards Revealing the Mystery behind Chain of Thought: a Theoretical Perspective

Figure 2 for Towards Revealing the Mystery behind Chain of Thought: a Theoretical Perspective

Figure 3 for Towards Revealing the Mystery behind Chain of Thought: a Theoretical Perspective

Share this with someone who'll enjoy it:

Abstract:Recent studies have discovered that Chain-of-Thought prompting (CoT) can dramatically improve the performance of Large Language Models (LLMs), particularly when dealing with complex tasks involving mathematics or reasoning. Despite the enormous empirical success, the underlying mechanisms behind CoT and how it unlocks the potential of LLMs remain elusive. In this paper, we take a first step towards theoretically answering these questions. Specifically, we examine the capacity of LLMs with CoT in solving fundamental mathematical and decision-making problems. We start by giving an impossibility result showing that any bounded-depth Transformer cannot directly output correct answers for basic arithmetic/equation tasks unless the model size grows super-polynomially with respect to the input length. In contrast, we then prove by construction that autoregressive Transformers of a constant size suffice to solve both tasks by generating CoT derivations using a commonly-used math language format. Moreover, we show LLMs with CoT are capable of solving a general class of decision-making problems known as Dynamic Programming, thus justifying its power in tackling complex real-world tasks. Finally, extensive experiments on four tasks show that, while Transformers always fail to predict the answers directly, they can consistently learn to generate correct solutions step-by-step given sufficient CoT demonstrations.

* 33 pages

View paper on

Share this with someone who'll enjoy it:

Title:Towards Revealing the Mystery behind Chain of Thought: a Theoretical Perspective

Paper and Code