Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Roger P. Levy

On the Same Wavelength? Evaluating Pragmatic Reasoning in Language Models across Broad Concepts

Sep 08, 2025

Linlu Qiu, Cedegao E. Zhang, Joshua B. Tenenbaum, Yoon Kim, Roger P. Levy

Abstract:Language use is shaped by pragmatics -- i.e., reasoning about communicative goals and norms in context. As language models (LMs) are increasingly used as conversational agents, it becomes ever more important to understand their pragmatic reasoning abilities. We propose an evaluation framework derived from Wavelength, a popular communication game where a speaker and a listener communicate about a broad range of concepts in a granular manner. We study a range of LMs on both language comprehension and language production using direct and Chain-of-Thought (CoT) prompting, and further explore a Rational Speech Act (RSA) approach to incorporating Bayesian pragmatic reasoning into LM inference. We find that state-of-the-art LMs, but not smaller ones, achieve strong performance on language comprehension, obtaining similar-to-human accuracy and exhibiting high correlations with human judgments even without CoT prompting or RSA. On language production, CoT can outperform direct prompting, and using RSA provides significant improvements over both approaches. Our study helps identify the strengths and limitations in LMs' pragmatic reasoning abilities and demonstrates the potential for improving them with RSA, opening up future avenues for understanding conceptual representation, language understanding, and social reasoning in LMs and humans.

* EMNLP 2025 (Main)

Via

Access Paper or Ask Questions

Multimodal Input Aids a Bayesian Model of Phonetic Learning

Jul 22, 2024

Sophia Zhi, Roger P. Levy, Stephan C. Meylan

Figure 1 for Multimodal Input Aids a Bayesian Model of Phonetic Learning

Abstract:One of the many tasks facing the typically-developing child language learner is learning to discriminate between the distinctive sounds that make up words in their native language. Here we investigate whether multimodal information--specifically adult speech coupled with video frames of speakers' faces--benefits a computational model of phonetic learning. We introduce a method for creating high-quality synthetic videos of speakers' faces for an existing audio corpus. Our learning model, when both trained and tested on audiovisual inputs, achieves up to a 8.1% relative improvement on a phoneme discrimination battery compared to a model trained and tested on audio-only input. It also outperforms the audio model by up to 3.9% when both are tested on audio-only data, suggesting that visual information facilitates the acquisition of acoustic distinctions. Visual information is especially beneficial in noisy audio environments, where an audiovisual model closes 67% of the loss in discrimination performance of the audio model in noise relative to a non-noisy environment. These results demonstrate that visual information benefits an ideal learner and illustrate some of the ways that children might be able to leverage visual cues when learning to discriminate speech sounds.

* 12 pages, 5 figures

Via

Access Paper or Ask Questions

Finding structure in logographic writing with library learning

May 11, 2024

Guangyuan Jiang, Matthias Hofer, Jiayuan Mao, Lionel Wong, Joshua B. Tenenbaum, Roger P. Levy

Abstract:One hallmark of human language is its combinatoriality -- reusing a relatively small inventory of building blocks to create a far larger inventory of increasingly complex structures. In this paper, we explore the idea that combinatoriality in language reflects a human inductive bias toward representational efficiency in symbol systems. We develop a computational framework for discovering structure in a writing system. Built on top of state-of-the-art library learning and program synthesis techniques, our computational framework discovers known linguistic structures in the Chinese writing system and reveals how the system evolves towards simplification under pressures for representational efficiency. We demonstrate how a library learning approach, utilizing learned abstractions and compression, may help reveal the fundamental computational principles that underlie the creation of combinatorial structures in human cognition, and offer broader insights into the evolution of efficient communication systems.

* Accepted at CogSci 2024 (Talk)

Via

Access Paper or Ask Questions

Testing the Predictions of Surprisal Theory in 11 Languages

Jul 10, 2023

Ethan Gotlieb Wilcox, Tiago Pimentel, Clara Meister, Ryan Cotterell, Roger P. Levy

Abstract:A fundamental result in psycholinguistics is that less predictable words take a longer time to process. One theoretical explanation for this finding is Surprisal Theory (Hale, 2001; Levy, 2008), which quantifies a word's predictability as its surprisal, i.e. its negative log-probability given a context. While evidence supporting the predictions of Surprisal Theory have been replicated widely, most have focused on a very narrow slice of data: native English speakers reading English texts. Indeed, no comprehensive multilingual analysis exists. We address this gap in the current literature by investigating the relationship between surprisal and reading times in eleven different languages, distributed across five language families. Deriving estimates from language models trained on monolingual and multilingual corpora, we test three predictions associated with surprisal theory: (i) whether surprisal is predictive of reading times; (ii) whether expected surprisal, i.e. contextual entropy, is predictive of reading times; (iii) and whether the linking function between surprisal and reading times is linear. We find that all three predictions are borne out crosslinguistically. By focusing on a more diverse set of languages, we argue that these results offer the most robust link to-date between information theory and incremental language processing across languages.

* This is a pre-MIT Press publication version of the paper

Via

Access Paper or Ask Questions

Unsupervised Discontinuous Constituency Parsing with Mildly Context-Sensitive Grammars

Dec 18, 2022

Songlin Yang, Roger P. Levy, Yoon Kim

Figure 1 for Unsupervised Discontinuous Constituency Parsing with Mildly Context-Sensitive Grammars

Figure 2 for Unsupervised Discontinuous Constituency Parsing with Mildly Context-Sensitive Grammars

Figure 3 for Unsupervised Discontinuous Constituency Parsing with Mildly Context-Sensitive Grammars

Figure 4 for Unsupervised Discontinuous Constituency Parsing with Mildly Context-Sensitive Grammars

Abstract:We study grammar induction with mildly context-sensitive grammars for unsupervised discontinuous parsing. Using the probabilistic linear context-free rewriting system (LCFRS) formalism, our approach fixes the rule structure in advance and focuses on parameter learning with maximum likelihood. To reduce the computational complexity of both parsing and parameter estimation, we restrict the grammar formalism to LCFRS-2 (i.e., binary LCFRS with fan-out two) and further discard rules that require O(n^6) time to parse, reducing inference to O(n^5). We find that using a large number of nonterminals is beneficial and thus make use of tensor decomposition-based rank-space dynamic programming with an embedding-based parameterization of rule probabilities to scale up the number of nonterminals. Experiments on German and Dutch show that our approach is able to induce linguistically meaningful trees with continuous and discontinuous structures

* Preprint

Via

Access Paper or Ask Questions

Probing for Incremental Parse States in Autoregressive Language Models

Nov 17, 2022

Tiwalayo Eisape, Vineet Gangireddy, Roger P. Levy, Yoon Kim

Figure 1 for Probing for Incremental Parse States in Autoregressive Language Models

Figure 2 for Probing for Incremental Parse States in Autoregressive Language Models

Figure 3 for Probing for Incremental Parse States in Autoregressive Language Models

Figure 4 for Probing for Incremental Parse States in Autoregressive Language Models

Abstract:Next-word predictions from autoregressive neural language models show remarkable sensitivity to syntax. This work evaluates the extent to which this behavior arises as a result of a learned ability to maintain implicit representations of incremental syntactic structures. We extend work in syntactic probing to the incremental setting and present several probes for extracting incomplete syntactic structure (operationalized through parse states from a stack-based parser) from autoregressive language models. We find that our probes can be used to predict model preferences on ambiguous sentence prefixes and causally intervene on model representations and steer model behavior. This suggests implicit incremental syntactic inferences underlie next-word predictions in autoregressive neural language models.

* Findings of EMNLP 2022

Via

Access Paper or Ask Questions

How Adults Understand What Young Children Say

Jun 15, 2022

Stephan C. Meylan, Ruthe Foushee, Nicole H. Wong, Elika Bergelson, Roger P. Levy

Figure 1 for How Adults Understand What Young Children Say

Figure 2 for How Adults Understand What Young Children Say

Figure 3 for How Adults Understand What Young Children Say

Figure 4 for How Adults Understand What Young Children Say

Abstract:Children's early speech often bears little resemblance to adult speech in form or content, and yet caregivers often find meaning in young children's utterances. Precisely how caregivers are able to do this remains poorly understood. We propose that successful early communication (an essential building block of language development) relies not just on children's growing linguistic knowledge, but also on adults' sophisticated inferences. These inferences, we further propose, are optimized for fine-grained details of how children speak. We evaluate these ideas using a set of candidate computational models of spoken word recognition based on deep learning and Bayesian inference, which instantiate competing hypotheses regarding the information sources used by adults to understand children. We find that the best-performing models (evaluated on datasets of adult interpretations of child speech) are those that have strong prior expectations about what children are likely to want to communicate, rather than the actual phonetic contents of what children say. We further find that adults' behavior is best characterized as well-tuned to specific children: the more closely a word recognition model is tuned to the particulars of an individual child's actual linguistic behavior, the better it predicts adults' inferences about what the child has said. These results offer a comprehensive investigation into the role of caregivers as child-directed listeners, with broader consequences for theories of language acquisition.

* 19 pages, 6 figures, 2 tables

Via

Access Paper or Ask Questions

Grammar-Based Grounded Lexicon Learning

Feb 17, 2022

Jiayuan Mao, Haoyue Shi, Jiajun Wu, Roger P. Levy, Joshua B. Tenenbaum

Figure 1 for Grammar-Based Grounded Lexicon Learning

Figure 2 for Grammar-Based Grounded Lexicon Learning

Figure 3 for Grammar-Based Grounded Lexicon Learning

Figure 4 for Grammar-Based Grounded Lexicon Learning

Abstract:We present Grammar-Based Grounded Lexicon Learning (G2L2), a lexicalist approach toward learning a compositional and grounded meaning representation of language from grounded data, such as paired images and texts. At the core of G2L2 is a collection of lexicon entries, which map each word to a tuple of a syntactic type and a neuro-symbolic semantic program. For example, the word shiny has a syntactic type of adjective; its neuro-symbolic semantic program has the symbolic form {\lambda}x. filter(x, SHINY), where the concept SHINY is associated with a neural network embedding, which will be used to classify shiny objects. Given an input sentence, G2L2 first looks up the lexicon entries associated with each token. It then derives the meaning of the sentence as an executable neuro-symbolic program by composing lexical meanings based on syntax. The recovered meaning programs can be executed on grounded inputs. To facilitate learning in an exponentially-growing compositional space, we introduce a joint parsing and expected execution algorithm, which does local marginalization over derivations to reduce the training time. We evaluate G2L2 on two domains: visual reasoning and language-driven navigation. Results show that G2L2 can generalize from small amounts of data to novel compositions of words.

* NeurIPS 2021. Project page: https://g2l2.csail.mit.edu/

Via

Access Paper or Ask Questions

A Targeted Assessment of Incremental Processing in Neural LanguageModels and Humans

Jun 06, 2021

Ethan Gotlieb Wilcox, Pranali Vani, Roger P. Levy

Figure 1 for A Targeted Assessment of Incremental Processing in Neural LanguageModels and Humans

Figure 2 for A Targeted Assessment of Incremental Processing in Neural LanguageModels and Humans

Figure 3 for A Targeted Assessment of Incremental Processing in Neural LanguageModels and Humans

Figure 4 for A Targeted Assessment of Incremental Processing in Neural LanguageModels and Humans

Abstract:We present a targeted, scaled-up comparison of incremental processing in humans and neural language models by collecting by-word reaction time data for sixteen different syntactic test suites across a range of structural phenomena. Human reaction time data comes from a novel online experimental paradigm called the Interpolated Maze task. We compare human reaction times to by-word probabilities for four contemporary language models, with different architectures and trained on a range of data set sizes. We find that across many phenomena, both humans and language models show increased processing difficulty in ungrammatical sentence regions with human and model `accuracy' scores (a la Marvin and Linzen(2018)) about equal. However, although language model outputs match humans in direction, we show that models systematically under-predict the difference in magnitude of incremental processing difficulty between grammatical and ungrammatical sentences. Specifically, when models encounter syntactic violations they fail to accurately predict the longer reaction times observed in the human data. These results call into question whether contemporary language models are approaching human-like performance for sensitivity to syntactic violations.

* To appear at ACL 2021

Via

Access Paper or Ask Questions

Child-directed Listening: How Caregiver Inference Enables Children's Early Verbal Communication

Feb 09, 2021

Stephan C. Meylan, Ruthe Foushee, Elika Bergelson, Roger P. Levy

Figure 1 for Child-directed Listening: How Caregiver Inference Enables Children's Early Verbal Communication

Figure 2 for Child-directed Listening: How Caregiver Inference Enables Children's Early Verbal Communication

Figure 3 for Child-directed Listening: How Caregiver Inference Enables Children's Early Verbal Communication

Figure 4 for Child-directed Listening: How Caregiver Inference Enables Children's Early Verbal Communication

Abstract:How do adults understand children's speech? Children's productions over the course of language development often bear little resemblance to typical adult pronunciations, yet caregivers nonetheless reliably recover meaning from them. Here, we employ a suite of Bayesian models of spoken word recognition to understand how adults overcome the noisiness of child language, showing that communicative success between children and adults relies heavily on adult inferential processes. By evaluating competing models on phonetically-annotated corpora, we show that adults' recovered meanings are best predicted by prior expectations fitted specifically to the child language environment, rather than to typical adult-adult language. After quantifying the contribution of this "child-directed listening" over developmental time, we discuss the consequences for theories of language acquisition, as well as the implications for commonly-used methods for assessing children's linguistic proficiency.

* 13 pages, 3 figures, 2 tables. Edit #1 fixes formatting on table 1 (fitting it onto a single page) and reports correct contents for table 1 (previous version reported ants, not bits)

Via

Access Paper or Ask Questions