Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sean Trott

Department of Cognitive Science, University of California San Diego

Measuring and Modifying the Readability of English Texts with GPT-4

Oct 17, 2024

Sean Trott, Pamela D. Rivière

Figure 1 for Measuring and Modifying the Readability of English Texts with GPT-4

Figure 2 for Measuring and Modifying the Readability of English Texts with GPT-4

Figure 3 for Measuring and Modifying the Readability of English Texts with GPT-4

Figure 4 for Measuring and Modifying the Readability of English Texts with GPT-4

Abstract:The success of Large Language Models (LLMs) in other domains has raised the question of whether LLMs can reliably assess and manipulate the readability of text. We approach this question empirically. First, using a published corpus of 4,724 English text excerpts, we find that readability estimates produced ``zero-shot'' from GPT-4 Turbo and GPT-4o mini exhibit relatively high correlation with human judgments (r = 0.76 and r = 0.74, respectively), out-performing estimates derived from traditional readability formulas and various psycholinguistic indices. Then, in a pre-registered human experiment (N = 59), we ask whether Turbo can reliably make text easier or harder to read. We find evidence to support this hypothesis, though considerable variance in human judgments remains unexplained. We conclude by discussing the limitations of this approach, including limited scope, as well as the validity of the ``readability'' construct and its dependence on context, audience, and goal.

* 9 pages, 6 figures, workshop TSAR 2024

Via

Access Paper or Ask Questions

Bidirectional Transformer Representations of (Spanish) Ambiguous Words in Context: A New Lexical Resource and Empirical Analysis

Jun 20, 2024

Pamela D. Rivière, Anne L. Beatty-Martínez, Sean Trott

Abstract:Lexical ambiguity -- where a single wordform takes on distinct, context-dependent meanings -- serves as a useful tool to compare across different large language models' (LLMs') ability to form distinct, contextualized representations of the same stimulus. Few studies have systematically compared LLMs' contextualized word embeddings for languages beyond English. Here, we evaluate multiple bidirectional transformers' (BERTs') semantic representations of Spanish ambiguous nouns in context. We develop a novel dataset of minimal-pair sentences evoking the same or different sense for a target ambiguous noun. In a pre-registered study, we collect contextualized human relatedness judgments for each sentence pair. We find that various BERT-based LLMs' contextualized semantic representations capture some variance in human judgments but fall short of the human benchmark, and for Spanish -- unlike English -- model scale is uncorrelated with performance. We also identify stereotyped trajectories of target noun disambiguation as a proportion of traversal through a given LLM family's architecture, which we partially replicate in English. We contribute (1) a dataset of controlled, Spanish sentence stimuli with human relatedness norms, and (2) to our evolving understanding of the impact that LLM specification (architectures, training protocols) exerts on contextualized embeddings.

* 16 pages, 12 figures, submitted to conference (EMNLP 2024)

Via

Access Paper or Ask Questions

Do language models capture implied discourse meanings? An investigation with exhaustivity implicatures of Korean morphology

May 15, 2024

Hagyeong Shin, Sean Trott

Abstract:Markedness in natural language is often associated with non-literal meanings in discourse. Differential Object Marking (DOM) in Korean is one instance of this phenomenon, where post-positional markers are selected based on both the semantic features of the noun phrases and the discourse features that are orthogonal to the semantic features. Previous work has shown that distributional models of language recover certain semantic features of words -- do these models capture implied discourse-level meanings as well? We evaluate whether a set of large language models are capable of associating discourse meanings with different object markings in Korean. Results suggest that discourse meanings of a grammatical marker can be more challenging to encode than that of a discourse marker.

* Proceedings of the Society for Computation in Linguistics (SCiL) 2024, Association for Computational Linguistics (ACL) Anthology

Via

Access Paper or Ask Questions

Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement

Mar 20, 2024

Catherine Arnett, Pamela D. Rivière, Tyler A. Chang, Sean Trott

Figure 1 for Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement

Figure 2 for Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement

Figure 3 for Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement

Figure 4 for Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement

Abstract:The relationship between language model tokenization and performance is an open area of research. Here, we investigate how different tokenization schemes impact number agreement in Spanish plurals. We find that morphologically-aligned tokenization performs similarly to other tokenization schemes, even when induced artificially for words that would not be tokenized that way during training. We then present exploratory analyses demonstrating that language model embeddings for different plural tokenizations have similar distributions along the embedding space axis that maximally distinguishes singular and plural nouns. Our results suggest that morphologically-aligned tokenization is a viable tokenization approach, and existing models already generalize some morphological patterns to new items. However, our results indicate that morphological tokenization is not strictly required for performance.

Via

Access Paper or Ask Questions

Do Large Language Models know what humans know?

Sep 04, 2022

Sean Trott, Cameron Jones, Tyler Chang, James Michaelov, Benjamin Bergen

Figure 1 for Do Large Language Models know what humans know?

Figure 2 for Do Large Language Models know what humans know?

Abstract:Humans can attribute mental states to others, a capacity known as Theory of Mind. However, it is unknown to what extent this ability results from an innate biological endowment or from experience accrued through child development, particularly exposure to language describing others' mental states. We test the viability of the language exposure hypothesis by assessing whether models exposed to large quantities of human language develop evidence of Theory of Mind. In a pre-registered analysis, we present a linguistic version of the False Belief Task, widely used to assess Theory of Mind, to both human participants and a state-of-the-art Large Language Model, GPT-3. Both are sensitive to others' beliefs, but the language model does not perform as well as the humans, nor does it explain the full extent of their behavior, despite being exposed to more language than a human would in a lifetime. This suggests that while language exposure may in part explain how humans develop Theory of Mind, other mechanisms are also responsible.

Via

Access Paper or Ask Questions

Contextualized Sensorimotor Norms: multi-dimensional measures of sensorimotor strength for ambiguous English words, in context

Mar 10, 2022

Sean Trott, Benjamin Bergen

Figure 1 for Contextualized Sensorimotor Norms: multi-dimensional measures of sensorimotor strength for ambiguous English words, in context

Figure 2 for Contextualized Sensorimotor Norms: multi-dimensional measures of sensorimotor strength for ambiguous English words, in context

Figure 3 for Contextualized Sensorimotor Norms: multi-dimensional measures of sensorimotor strength for ambiguous English words, in context

Figure 4 for Contextualized Sensorimotor Norms: multi-dimensional measures of sensorimotor strength for ambiguous English words, in context

Abstract:Most large language models are trained on linguistic input alone, yet humans appear to ground their understanding of words in sensorimotor experience. A natural solution is to augment LM representations with human judgments of a word's sensorimotor associations (e.g., the Lancaster Sensorimotor Norms), but this raises another challenge: most words are ambiguous, and judgments of words in isolation fail to account for this multiplicity of meaning (e.g., "wooden table" vs. "data table"). We attempted to address this problem by building a new lexical resource of contextualized sensorimotor judgments for 112 English words, each rated in four different contexts (448 sentences total). We show that these ratings encode overlapping but distinct information from the Lancaster Sensorimotor Norms, and that they also predict other measures of interest (e.g., relatedness), above and beyond measures derived from BERT. Beyond shedding light on theoretical questions, we suggest that these ratings could be of use as a "challenge set" for researchers building grounded language models.

Via

Access Paper or Ask Questions

RAW-C: Relatedness of Ambiguous Words--in Context

May 27, 2021

Sean Trott, Benjamin Bergen

Figure 1 for RAW-C: Relatedness of Ambiguous Words--in Context

Figure 2 for RAW-C: Relatedness of Ambiguous Words--in Context

Figure 3 for RAW-C: Relatedness of Ambiguous Words--in Context

Figure 4 for RAW-C: Relatedness of Ambiguous Words--in Context

Abstract:Most words are ambiguous--i.e., they convey distinct meanings in different contexts--and even the meanings of unambiguous words are context-dependent. Both phenomena present a challenge for NLP. Recently, the advent of contextualized word embeddings has led to success on tasks involving lexical ambiguity, such as Word Sense Disambiguation. However, there are few tasks that directly evaluate how well these contextualized embeddings accommodate the more continuous, dynamic nature of word meaning--particularly in a way that matches human intuitions. We introduce RAW-C, a dataset of graded, human relatedness judgments for 112 ambiguous words in context (with 672 sentence pairs total), as well as human estimates of sense dominance. The average inter-annotator agreement (assessed using a leave-one-annotator-out method) was 0.79. We then show that a measure of cosine distance, computed using contextualized embeddings from BERT and ELMo, correlates with human judgments, but that cosine distance also systematically underestimates how similar humans find uses of the same sense of a word to be, and systematically overestimates how similar humans find uses of different-sense homonyms. Finally, we propose a synthesis between psycholinguistic theories of the mental lexicon and computational models of lexical semantics.

* ACL-IJCNLP 2021 camera-ready

Via

Access Paper or Ask Questions

(Re)construing Meaning in NLP

May 18, 2020

Sean Trott, Tiago Timponi Torrent, Nancy Chang, Nathan Schneider

Abstract:Human speakers have an extensive toolkit of ways to express themselves. In this paper, we engage with an idea largely absent from discussions of meaning in natural language understanding--namely, that the way something is expressed reflects different ways of conceptualizing or construing the information being conveyed. We first define this phenomenon more precisely, drawing on considerable prior work in theoretical cognitive semantics and psycholinguistics. We then survey some dimensions of construed meaning and show how insights from construal could inform theoretical and practical work in NLP.

* ACL 2020 camera-ready

Via

Access Paper or Ask Questions

Processing Natural Language About Ongoing Actions

Jul 30, 2016

Steve Doubleday, Sean Trott, Jerome Feldman

Figure 1 for Processing Natural Language About Ongoing Actions

Figure 2 for Processing Natural Language About Ongoing Actions

Figure 3 for Processing Natural Language About Ongoing Actions

Figure 4 for Processing Natural Language About Ongoing Actions

Abstract:Actions may not proceed as planned; they may be interrupted, resumed or overridden. This is a challenge to handle in a natural language understanding system. We describe extensions to an existing implementation for the control of autonomous systems by natural language, to enable such systems to handle incoming language requests regarding actions. Language Communication with Autonomous Systems (LCAS) has been extended with support for X-nets, parameterized executable schemas representing actions. X-nets enable the system to control actions at a desired level of granularity, while providing a mechanism for language requests to be processed asynchronously. Standard semantics supported include requests to stop, continue, or override the existing action. The specific domain demonstrated is the control of motion of a simulated robot, but the approach is general, and could be applied to other domains.

* 6 pages, 8 figures. Updated with PIPE citations

Via

Access Paper or Ask Questions

Exploiting Deep Semantics and Compositionality of Natural Language for Human-Robot-Interaction

Apr 22, 2016

Manfred Eppe, Sean Trott, Jerome Feldman

Figure 1 for Exploiting Deep Semantics and Compositionality of Natural Language for Human-Robot-Interaction

Figure 2 for Exploiting Deep Semantics and Compositionality of Natural Language for Human-Robot-Interaction

Figure 3 for Exploiting Deep Semantics and Compositionality of Natural Language for Human-Robot-Interaction

Figure 4 for Exploiting Deep Semantics and Compositionality of Natural Language for Human-Robot-Interaction

Abstract:We develop a natural language interface for human robot interaction that implements reasoning about deep semantics in natural language. To realize the required deep analysis, we employ methods from cognitive linguistics, namely the modular and compositional framework of Embodied Construction Grammar (ECG) [Feldman, 2009]. Using ECG, robots are able to solve fine-grained reference resolution problems and other issues related to deep semantics and compositionality of natural language. This also includes verbal interaction with humans to clarify commands and queries that are too ambiguous to be executed safely. We implement our NLU framework as a ROS package and present proof-of-concept scenarios with different robots, as well as a survey on the state of the art.

Via

Access Paper or Ask Questions