Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Babak Haghighat

Phase Transitions in Large Language Models and the $O(N)$ Model

Jan 27, 2025

Youran Sun, Babak Haghighat

Abstract:Large language models (LLMs) exhibit unprecedentedly rich scaling behaviors. In physics, scaling behavior is closely related to phase transitions, critical phenomena, and field theory. To investigate the phase transition phenomena in LLMs, we reformulated the Transformer architecture as an $O(N)$ model. Our study reveals two distinct phase transitions corresponding to the temperature used in text generation and the model's parameter size, respectively. The first phase transition enables us to estimate the internal dimension of the model, while the second phase transition is of \textit{higher-depth} and signals the emergence of new capabilities. As an application, the energy of the $O(N)$ model can be used to evaluate whether an LLM's parameters are sufficient to learn the training data.

Via

Access Paper or Ask Questions

Riemann-Theta Boltzmann Machine

Apr 06, 2018

Daniel Krefl, Stefano Carrazza, Babak Haghighat, Jens Kahlen

Figure 1 for Riemann-Theta Boltzmann Machine

Figure 2 for Riemann-Theta Boltzmann Machine

Figure 3 for Riemann-Theta Boltzmann Machine

Figure 4 for Riemann-Theta Boltzmann Machine

Abstract:A general Boltzmann machine with continuous visible and discrete integer valued hidden states is introduced. Under mild assumptions about the connection matrices, the probability density function of the visible units can be solved for analytically, yielding a novel parametric density function involving a ratio of Riemann-Theta functions. The conditional expectation of a hidden state for given visible states can also be calculated analytically, yielding a derivative of the logarithmic Riemann-Theta function. The conditional expectation can be used as activation function in a feedforward neural network, thereby increasing the modelling capacity of the network. Both the Boltzmann machine and the derived feedforward neural network can be successfully trained via standard gradient- and non-gradient-based optimization techniques.

* 25 pages, 11 figures, typos corrected

Via

Access Paper or Ask Questions