https://github.com/Zodiark-ch/Emergence-of-LLMs/.
Emergence, broadly conceptualized as the ``intelligent'' behaviors of LLMs, has recently been studied and proved challenging to quantify due to the lack of a measurable definition. Most commonly, it has been estimated statistically through model performances across extensive datasets and tasks, which consumes significant resources. In addition, such estimation is difficult to interpret and may not accurately reflect the models' intrinsic emergence. In this work, we propose a quantifiable solution for estimating emergence. Inspired by emergentism in dynamics, we quantify the strength of emergence by comparing the entropy reduction of the macroscopic (semantic) level with that of the microscopic (token) level, both of which are derived from the representations within the transformer block. Using a low-cost estimator, our quantification method demonstrates consistent behaviors across a suite of LMs (GPT-2, GEMMA, etc.) under both in-context learning (ICL) and natural sentences. Empirical results show that (1) our method gives consistent measurements which align with existing observations based on performance metrics, validating the effectiveness of our emergence quantification; (2) our proposed metric uncovers novel emergence patterns such as the correlations between the variance of our metric and the number of ``shots'' in ICL, which further suggests a new way of interpreting hallucinations in LLMs; (3) we offer a potential solution towards estimating the emergence of larger and closed-resource LMs via smaller LMs like GPT-2. Our codes are available at: