Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yoshihiko Nishikawa

Critical Phase Transition in a Large Language Model

Jun 08, 2024

Kai Nakaishi, Yoshihiko Nishikawa, Koji Hukushima

Abstract:The performance of large language models (LLMs) strongly depends on the \textit{temperature} parameter. Empirically, at very low temperatures, LLMs generate sentences with clear repetitive structures, while at very high temperatures, generated sentences are often incomprehensible. In this study, using GPT-2, we numerically demonstrate that the difference between the two regimes is not just a smooth change but a phase transition with singular, divergent statistical quantities. Our extensive analysis shows that critical behaviors, such as a power-law decay of correlation in a text, emerge in the LLM at the transition temperature as well as in a natural language dataset. We also discuss that several statistical quantities characterizing the criticality should be useful to evaluate the performance of LLMs.

* 9 pages, 6 figures

Via

Access Paper or Ask Questions

Random postprocessing for combinatorial Bayesian optimization

Sep 06, 2023

Keisuke Morita, Yoshihiko Nishikawa, Masayuki Ohzeki

Figure 1 for Random postprocessing for combinatorial Bayesian optimization

Figure 2 for Random postprocessing for combinatorial Bayesian optimization

Figure 3 for Random postprocessing for combinatorial Bayesian optimization

Figure 4 for Random postprocessing for combinatorial Bayesian optimization

Abstract:Model-based sequential approaches to discrete "black-box" optimization, including Bayesian optimization techniques, often access the same points multiple times for a given objective function in interest, resulting in many steps to find the global optimum. Here, we numerically study the effect of a postprocessing method on Bayesian optimization that strictly prohibits duplicated samples in the dataset. We find the postprocessing method significantly reduces the number of sequential steps to find the global optimum, especially when the acquisition function is of maximum a posterior estimation. Our results provide a simple but general strategy to solve the slow convergence of Bayesian optimization for high-dimensional problems.

* 5 pages, 4 figures

Via

Access Paper or Ask Questions