Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Javier Coronado-Blázquez

Stochastic Streets: A Walk Through Random LLM Address Generation in four European Cities

Sep 16, 2025

Tairan Fu, David Campo-Nazareno, Javier Coronado-Blázquez, Javier Conde, Pedro Reviriego, Fabrizio Lombardi

Abstract:Large Language Models (LLMs) are capable of solving complex math problems or answer difficult questions on almost any topic, but can they generate random street addresses for European cities?

Via

Access Paper or Ask Questions

Evaluating book summaries from internal knowledge in Large Language Models: a cross-model and semantic consistency approach

Mar 27, 2025

Javier Coronado-Blázquez

Abstract:We study the ability of large language models (LLMs) to generate comprehensive and accurate book summaries solely from their internal knowledge, without recourse to the original text. Employing a diverse set of books and multiple LLM architectures, we examine whether these models can synthesize meaningful narratives that align with established human interpretations. Evaluation is performed with a LLM-as-a-judge paradigm: each AI-generated summary is compared against a high-quality, human-written summary via a cross-model assessment, where all participating LLMs evaluate not only their own outputs but also those produced by others. This methodology enables the identification of potential biases, such as the proclivity for models to favor their own summarization style over others. In addition, alignment between the human-crafted and LLM-generated summaries is quantified using ROUGE and BERTScore metrics, assessing the depth of grammatical and semantic correspondence. The results reveal nuanced variations in content representation and stylistic preferences among the models, highlighting both strengths and limitations inherent in relying on internal knowledge for summarization tasks. These findings contribute to a deeper understanding of LLM internal encodings of factual information and the dynamics of cross-model evaluation, with implications for the development of more robust natural language generative systems.

* 22 pages, 6 figures

Via

Access Paper or Ask Questions

Deterministic or probabilistic? The psychology of LLMs as random number generators

Feb 27, 2025

Javier Coronado-Blázquez

Abstract:Large Language Models (LLMs) have transformed text generation through inherently probabilistic context-aware mechanisms, mimicking human natural language. In this paper, we systematically investigate the performance of various LLMs when generating random numbers, considering diverse configurations such as different model architectures, numerical ranges, temperature, and prompt languages. Our results reveal that, despite their stochastic transformers-based architecture, these models often exhibit deterministic responses when prompted for random numerical outputs. In particular, we find significant differences when changing the model, as well as the prompt language, attributing this phenomenon to biases deeply embedded within the training data. Models such as DeepSeek-R1 can shed some light on the internal reasoning process of LLMs, despite arriving to similar results. These biases induce predictable patterns that undermine genuine randomness, as LLMs are nothing but reproducing our own human cognitive biases.

* 31 pages, 12 figures

Via

Access Paper or Ask Questions

A NLP Approach to "Review Bombing" in Metacritic PC Videogames User Ratings

May 10, 2024

Javier Coronado-Blázquez

Figure 1 for A NLP Approach to "Review Bombing" in Metacritic PC Videogames User Ratings

Figure 2 for A NLP Approach to "Review Bombing" in Metacritic PC Videogames User Ratings

Figure 3 for A NLP Approach to "Review Bombing" in Metacritic PC Videogames User Ratings

Figure 4 for A NLP Approach to "Review Bombing" in Metacritic PC Videogames User Ratings

Abstract:Many videogames suffer "review bombing" -a large volume of unusually low scores that in many cases do not reflect the real quality of the product- when rated by users. By taking Metacritic's 50,000+ user score aggregations for PC games in English language, we use a Natural Language Processing (NLP) approach to try to understand the main words and concepts appearing in such cases, reaching a 0.88 accuracy on a validation set when distinguishing between just bad ratings and review bombings. By uncovering and analyzing the patterns driving this phenomenon, these results could be used to further mitigate these situations.

* 11 pages, 4 figures. Accepted by Discover Artificial Intelligence but withdrawn due to APC

Via

Access Paper or Ask Questions