Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Emmanuel Chemla

A Neural Model for Word Repetition

Jun 16, 2025

Daniel Dager, Robin Sobczyk, Emmanuel Chemla, Yair Lakretz

Abstract:It takes several years for the developing brain of a baby to fully master word repetition-the task of hearing a word and repeating it aloud. Repeating a new word, such as from a new language, can be a challenging task also for adults. Additionally, brain damage, such as from a stroke, may lead to systematic speech errors with specific characteristics dependent on the location of the brain damage. Cognitive sciences suggest a model with various components for the different processing stages involved in word repetition. While some studies have begun to localize the corresponding regions in the brain, the neural mechanisms and how exactly the brain performs word repetition remain largely unknown. We propose to bridge the gap between the cognitive model of word repetition and neural mechanisms in the human brain by modeling the task using deep neural networks. Neural models are fully observable, allowing us to study the detailed mechanisms in their various substructures and make comparisons with human behavior and, ultimately, the brain. Here, we make first steps in this direction by: (1) training a large set of models to simulate the word repetition task; (2) creating a battery of tests to probe the models for known effects from behavioral studies in humans, and (3) simulating brain damage through ablation studies, where we systematically remove neurons from the model, and repeat the behavioral study to examine the resulting speech errors in the "patient" model. Our results show that neural models can mimic several effects known from human research, but might diverge in other aspects, highlighting both the potential and the challenges for future research aimed at developing human-like neural models.

* To appear at Cognitive Computational Neuroscience 2025 (CCN)

Via

Access Paper or Ask Questions

fastabx: A library for efficient computation of ABX discriminability

May 05, 2025

Maxime Poli, Emmanuel Chemla, Emmanuel Dupoux

Abstract:We introduce fastabx, a high-performance Python library for building ABX discrimination tasks. ABX is a measure of the separation between generic categories of interest. It has been used extensively to evaluate phonetic discriminability in self-supervised speech representations. However, its broader adoption has been limited by the absence of adequate tools. fastabx addresses this gap by providing a framework capable of constructing any type of ABX task while delivering the efficiency necessary for rapid development cycles, both in task creation and in calculating distances between representations. We believe that fastabx will serve as a valuable resource for the broader representation learning community, enabling researchers to systematically investigate what information can be directly extracted from learned representations across several domains beyond speech processing. The source code is available at https://github.com/bootphon/fastabx.

* 8 pages, 6 figures

Via

Access Paper or Ask Questions

Large Language Models as Proxies for Theories of Human Linguistic Cognition

Feb 11, 2025

Imry Ziv, Nur Lan, Emmanuel Chemla, Roni Katzir

Abstract:We consider the possible role of current large language models (LLMs) in the study of human linguistic cognition. We focus on the use of such models as proxies for theories of cognition that are relatively linguistically-neutral in their representations and learning but differ from current LLMs in key ways. We illustrate this potential use of LLMs as proxies for theories of cognition in the context of two kinds of questions: (a) whether the target theory accounts for the acquisition of a given pattern from a given corpus; and (b) whether the target theory makes a given typologically-attested pattern easier to acquire than another, typologically-unattested pattern. For each of the two questions we show, building on recent literature, how current LLMs can potentially be of help, but we note that at present this help is quite limited.

Via

Access Paper or Ask Questions

Disentanglement and Compositionality of Letter Identity and Letter Position in Variational Auto-Encoder Vision Models

Dec 11, 2024

Bruno Bianchi, Aakash Agrawal, Stanislas Dehaene, Emmanuel Chemla, Yair Lakretz

Figure 1 for Disentanglement and Compositionality of Letter Identity and Letter Position in Variational Auto-Encoder Vision Models

Figure 2 for Disentanglement and Compositionality of Letter Identity and Letter Position in Variational Auto-Encoder Vision Models

Figure 3 for Disentanglement and Compositionality of Letter Identity and Letter Position in Variational Auto-Encoder Vision Models

Figure 4 for Disentanglement and Compositionality of Letter Identity and Letter Position in Variational Auto-Encoder Vision Models

Abstract:Human readers can accurately count how many letters are in a word (e.g., 7 in ``buffalo''), remove a letter from a given position (e.g., ``bufflo'') or add a new one. The human brain of readers must have therefore learned to disentangle information related to the position of a letter and its identity. Such disentanglement is necessary for the compositional, unbounded, ability of humans to create and parse new strings, with any combination of letters appearing in any positions. Do modern deep neural models also possess this crucial compositional ability? Here, we tested whether neural models that achieve state-of-the-art on disentanglement of features in visual input can also disentangle letter position and letter identity when trained on images of written words. Specifically, we trained beta variational autoencoder ($\beta$-VAE) to reconstruct images of letter strings and evaluated their disentanglement performance using CompOrth - a new benchmark that we created for studying compositional learning and zero-shot generalization in visual models for orthography. The benchmark suggests a set of tests, of increasing complexity, to evaluate the degree of disentanglement between orthographic features of written words in deep neural models. Using CompOrth, we conducted a set of experiments to analyze the generalization ability of these models, in particular, to unseen word length and to unseen combinations of letter identities and letter positions. We found that while models effectively disentangle surface features, such as horizontal and vertical `retinal' locations of words within an image, they dramatically fail to disentangle letter position and letter identity and lack any notion of word length. Together, this study demonstrates the shortcomings of state-of-the-art $\beta$-VAE models compared to humans and proposes a new challenge and a corresponding benchmark to evaluate neural models.

Via

Access Paper or Ask Questions

A polar coordinate system represents syntax in large language models

Dec 07, 2024

Pablo Diego-Simón, Stéphane D'Ascoli, Emmanuel Chemla, Yair Lakretz, Jean-Rémi King

Figure 1 for A polar coordinate system represents syntax in large language models

Figure 2 for A polar coordinate system represents syntax in large language models

Figure 3 for A polar coordinate system represents syntax in large language models

Figure 4 for A polar coordinate system represents syntax in large language models

Abstract:Originally formalized with symbolic representations, syntactic trees may also be effectively represented in the activations of large language models (LLMs). Indeed, a 'Structural Probe' can find a subspace of neural activations, where syntactically related words are relatively close to one-another. However, this syntactic code remains incomplete: the distance between the Structural Probe word embeddings can represent the existence but not the type and direction of syntactic relations. Here, we hypothesize that syntactic relations are, in fact, coded by the relative direction between nearby embeddings. To test this hypothesis, we introduce a 'Polar Probe' trained to read syntactic relations from both the distance and the direction between word embeddings. Our approach reveals three main findings. First, our Polar Probe successfully recovers the type and direction of syntactic relations, and substantially outperforms the Structural Probe by nearly two folds. Second, we confirm that this polar coordinate system exists in a low-dimensional subspace of the intermediate layers of many LLMs and becomes increasingly precise in the latest frontier models. Third, we demonstrate with a new benchmark that similar syntactic relations are coded similarly across the nested levels of syntactic trees. Overall, this work shows that LLMs spontaneously learn a geometry of neural activations that explicitly represents the main symbolic structures of linguistic theory.

* NeurIPS 2024

Via

Access Paper or Ask Questions

No Such Thing as a General Learner: Language models and their dual optimization

Aug 21, 2024

Emmanuel Chemla, Ryan M. Nefdt

Abstract:What role can the otherwise successful Large Language Models (LLMs) play in the understanding of human cognition, and in particular in terms of informing language acquisition debates? To contribute to this question, we first argue that neither humans nor LLMs are general learners, in a variety of senses. We make a novel case for how in particular LLMs follow a dual-optimization process: they are optimized during their training (which is typically compared to language acquisition), and modern LLMs have also been selected, through a process akin to natural selection in a species. From this perspective, we argue that the performance of LLMs, whether similar or dissimilar to that of humans, does not weigh easily on important debates about the importance of human cognitive biases for language.

* 11 pages, 4 figures

Via

Access Paper or Ask Questions

What Makes Two Language Models Think Alike?

Jun 24, 2024

Jeanne Salle, Louis Jalouzot, Nur Lan, Emmanuel Chemla, Yair Lakretz

Figure 1 for What Makes Two Language Models Think Alike?

Figure 2 for What Makes Two Language Models Think Alike?

Figure 3 for What Makes Two Language Models Think Alike?

Figure 4 for What Makes Two Language Models Think Alike?

Abstract:Do architectural differences significantly affect the way models represent and process language? We propose a new approach, based on metric-learning encoding models (MLEMs), as a first step to answer this question. The approach provides a feature-based comparison of how any two layers of any two models represent linguistic information. We apply the method to BERT, GPT-2 and Mamba. Unlike previous methods, MLEMs offer a transparent comparison, by identifying the specific linguistic features responsible for similarities and differences. More generally, the method uses formal, symbolic descriptions of a domain, and use these to compare neural representations. As such, the approach can straightforwardly be extended to other domains, such as speech and vision, and to other neural systems, including human brains.

* 7 pages, 6 figures

Via

Access Paper or Ask Questions

What makes two models think alike?

Jun 18, 2024

Jeanne Salle, Louis Jalouzot, Nur Lan, Emmanuel Chemla, Yair Lakretz

Figure 1 for What makes two models think alike?

Figure 2 for What makes two models think alike?

Figure 3 for What makes two models think alike?

Figure 4 for What makes two models think alike?

* 7 pages, 6 figures

Via

Access Paper or Ask Questions

The Impact of Syntactic and Semantic Proximity on Machine Translation with Back-Translation

Mar 26, 2024

Nicolas Guerin, Shane Steinert-Threlkeld, Emmanuel Chemla

Abstract:Unsupervised on-the-fly back-translation, in conjunction with multilingual pretraining, is the dominant method for unsupervised neural machine translation. Theoretically, however, the method should not work in general. We therefore conduct controlled experiments with artificial languages to determine what properties of languages make back-translation an effective training method, covering lexical, syntactic, and semantic properties. We find, contrary to popular belief, that (i) parallel word frequency distributions, (ii) partially shared vocabulary, and (iii) similar syntactic structure across languages are not sufficient to explain the success of back-translation. We show however that even crude semantic signal (similar lexical fields across languages) does improve alignment of two languages through back-translation. We conjecture that rich semantic dependencies, parallel across languages, are at the root of the success of unsupervised methods based on back-translation. Overall, the success of unsupervised machine translation was far from being analytically guaranteed. Instead, it is another proof that languages of the world share deep similarities, and we hope to show how to identify which of these similarities can serve the development of unsupervised, cross-linguistic tools.

Via

Access Paper or Ask Questions

Metric-Learning Encoding Models Identify Processing Profiles of Linguistic Features in BERT's Representations

Feb 18, 2024

Louis Jalouzot, Robin Sobczyk, Bastien Lhopitallier, Jeanne Salle, Nur Lan, Emmanuel Chemla, Yair Lakretz

Figure 1 for Metric-Learning Encoding Models Identify Processing Profiles of Linguistic Features in BERT's Representations

Figure 2 for Metric-Learning Encoding Models Identify Processing Profiles of Linguistic Features in BERT's Representations

Figure 3 for Metric-Learning Encoding Models Identify Processing Profiles of Linguistic Features in BERT's Representations

Figure 4 for Metric-Learning Encoding Models Identify Processing Profiles of Linguistic Features in BERT's Representations

Abstract:We introduce Metric-Learning Encoding Models (MLEMs) as a new approach to understand how neural systems represent the theoretical features of the objects they process. As a proof-of-concept, we apply MLEMs to neural representations extracted from BERT, and track a wide variety of linguistic features (e.g., tense, subject person, clause type, clause embedding). We find that: (1) linguistic features are ordered: they separate representations of sentences to different degrees in different layers; (2) neural representations are organized hierarchically: in some layers, we find clusters of representations nested within larger clusters, following successively important linguistic features; (3) linguistic features are disentangled in middle layers: distinct, selective units are activated by distinct linguistic features. Methodologically, MLEMs are superior (4) to multivariate decoding methods, being more robust to type-I errors, and (5) to univariate encoding methods, in being able to predict both local and distributed representations. Together, this demonstrates the utility of Metric-Learning Encoding Methods for studying how linguistic features are neurally encoded in language models and the advantage of MLEMs over traditional methods. MLEMs can be extended to other domains (e.g. vision) and to other neural systems, such as the human brain.

* 17 pages, 13 figures

Via

Access Paper or Ask Questions