Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pamela D. Rivière

Department of Cognitive Science, University of California San Diego

Measuring and Modifying the Readability of English Texts with GPT-4

Oct 17, 2024

Sean Trott, Pamela D. Rivière

Figure 1 for Measuring and Modifying the Readability of English Texts with GPT-4

Figure 2 for Measuring and Modifying the Readability of English Texts with GPT-4

Figure 3 for Measuring and Modifying the Readability of English Texts with GPT-4

Figure 4 for Measuring and Modifying the Readability of English Texts with GPT-4

Abstract:The success of Large Language Models (LLMs) in other domains has raised the question of whether LLMs can reliably assess and manipulate the readability of text. We approach this question empirically. First, using a published corpus of 4,724 English text excerpts, we find that readability estimates produced ``zero-shot'' from GPT-4 Turbo and GPT-4o mini exhibit relatively high correlation with human judgments (r = 0.76 and r = 0.74, respectively), out-performing estimates derived from traditional readability formulas and various psycholinguistic indices. Then, in a pre-registered human experiment (N = 59), we ask whether Turbo can reliably make text easier or harder to read. We find evidence to support this hypothesis, though considerable variance in human judgments remains unexplained. We conclude by discussing the limitations of this approach, including limited scope, as well as the validity of the ``readability'' construct and its dependence on context, audience, and goal.

* 9 pages, 6 figures, workshop TSAR 2024

Via

Access Paper or Ask Questions

Bidirectional Transformer Representations of (Spanish) Ambiguous Words in Context: A New Lexical Resource and Empirical Analysis

Jun 20, 2024

Pamela D. Rivière, Anne L. Beatty-Martínez, Sean Trott

Abstract:Lexical ambiguity -- where a single wordform takes on distinct, context-dependent meanings -- serves as a useful tool to compare across different large language models' (LLMs') ability to form distinct, contextualized representations of the same stimulus. Few studies have systematically compared LLMs' contextualized word embeddings for languages beyond English. Here, we evaluate multiple bidirectional transformers' (BERTs') semantic representations of Spanish ambiguous nouns in context. We develop a novel dataset of minimal-pair sentences evoking the same or different sense for a target ambiguous noun. In a pre-registered study, we collect contextualized human relatedness judgments for each sentence pair. We find that various BERT-based LLMs' contextualized semantic representations capture some variance in human judgments but fall short of the human benchmark, and for Spanish -- unlike English -- model scale is uncorrelated with performance. We also identify stereotyped trajectories of target noun disambiguation as a proportion of traversal through a given LLM family's architecture, which we partially replicate in English. We contribute (1) a dataset of controlled, Spanish sentence stimuli with human relatedness norms, and (2) to our evolving understanding of the impact that LLM specification (architectures, training protocols) exerts on contextualized embeddings.

* 16 pages, 12 figures, submitted to conference (EMNLP 2024)

Via

Access Paper or Ask Questions

Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement

Mar 20, 2024

Catherine Arnett, Pamela D. Rivière, Tyler A. Chang, Sean Trott

Figure 1 for Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement

Figure 2 for Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement

Figure 3 for Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement

Figure 4 for Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement

Abstract:The relationship between language model tokenization and performance is an open area of research. Here, we investigate how different tokenization schemes impact number agreement in Spanish plurals. We find that morphologically-aligned tokenization performs similarly to other tokenization schemes, even when induced artificially for words that would not be tokenized that way during training. We then present exploratory analyses demonstrating that language model embeddings for different plural tokenizations have similar distributions along the embedding space axis that maximally distinguishes singular and plural nouns. Our results suggest that morphologically-aligned tokenization is a viable tokenization approach, and existing models already generalize some morphological patterns to new items. However, our results indicate that morphological tokenization is not strictly required for performance.

Via

Access Paper or Ask Questions