Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition

Apr 04, 2025

Rishi Hazra, Gabriele Venturato, Pedro Zuidberg Dos Martires, Luc De Raedt

Share this with someone who'll enjoy it:

Abstract:Large Language Models (LLMs) have been touted as AI models possessing advanced reasoning abilities. In theory, autoregressive LLMs with Chain-of-Thought (CoT) can perform more serial computations to solve complex reasoning tasks. However, recent studies suggest that, despite this capacity, LLMs do not truly learn to reason but instead fit on statistical features. To study the reasoning capabilities in a principled fashion, we adopt a computational theory perspective and propose an experimental protocol centered on 3-SAT -- the prototypical NP-complete problem lying at the core of logical reasoning and constraint satisfaction tasks. Specifically, we examine the phase transitions in random 3-SAT and characterize the reasoning abilities of state-of-the-art LLMs by varying the inherent hardness of the problem instances. By comparing DeepSeek R1 with other LLMs, our findings reveal two key insights (1) LLM accuracy drops significantly on harder instances, suggesting all current models struggle when statistical shortcuts are unavailable (2) Unlike other LLMs, R1 shows signs of having learned the underlying reasoning. Following a principled experimental protocol, our study moves beyond the benchmark-driven evidence often found in LLM reasoning research. Our findings highlight important gaps and suggest clear directions for future research.

* An updated version of arXiv:2408.07215v2, featuring: (1) inclusion of recent LRMs and recent LLMs, (2) revised conclusions reflecting recent developments, and (3) updated analysis

View paper on

Share this with someone who'll enjoy it:

Title:Have Large Language Models Learned to Reason? A Characterization via 3-SAT Phase Transition

Paper and Code