Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jordan Kodner

LLMs can Perform Multi-Dimensional Analytic Writing Assessments: A Case Study of L2 Graduate-Level Academic English Writing

Feb 17, 2025

Zhengxiang Wang, Veronika Makarova, Zhi Li, Jordan Kodner, Owen Rambow

Abstract:The paper explores the performance of LLMs in the context of multi-dimensional analytic writing assessments, i.e. their ability to provide both scores and comments based on multiple assessment criteria. Using a corpus of literature reviews written by L2 graduate students and assessed by human experts against 9 analytic criteria, we prompt several popular LLMs to perform the same task under various conditions. To evaluate the quality of feedback comments, we apply a novel feedback comment quality evaluation framework. This framework is interpretable, cost-efficient, scalable, and reproducible, compared to existing methods that rely on manual judgments. We find that LLMs can generate reasonably good and generally reliable multi-dimensional analytic assessments. We release our corpus for reproducibility.

* 26 pages, 6 figures, 15 tables

Via

Access Paper or Ask Questions

Evaluating LLMs with Multiple Problems at once: A New Paradigm for Probing LLM Capabilities

Jun 16, 2024

Zhengxiang Wang, Jordan Kodner, Owen Rambow

Abstract:Current LLM evaluation predominantly performs evaluation with prompts comprising single problems. We propose multi-problem evaluation as an additional approach to study the multiple problem handling capabilities of LLMs. We present a systematic study in this regard by comprehensively examining 7 LLMs on 4 related types of tasks constructed from 6 classification benchmarks. The 4 task types include traditional single-problem tasks, homogeneous multi-problem tasks, and two index selection tasks that embed the multi-problem tasks. We find that LLMs are competent multi-problem solvers: they generally perform (nearly) as well on multi-problem tasks as on single-problem tasks. Furthermore, contrary to common expectation, they often do not suffer from a positional bias with long inputs. This makes multi-problem prompting a simple and cost-efficient prompting method of practical significance. However, our results also strongly indicate that LLMs lack true understanding: they perform significantly worse in the two index selection tasks than in the multi-problem task under various evaluation settings, although they can indeed do index selection in general.

* 20 pages, 15 figures, 9 tables

Via

Access Paper or Ask Questions

Evaluating Neural Language Models as Cognitive Models of Language Acquisition

Oct 31, 2023

Héctor Javier Vázquez Martínez, Annika Lea Heuser, Charles Yang, Jordan Kodner

Abstract:The success of neural language models (LMs) on many technological tasks has brought about their potential relevance as scientific theories of language despite some clear differences between LM training and child language acquisition. In this paper we argue that some of the most prominent benchmarks for evaluating the syntactic capacities of LMs may not be sufficiently rigorous. In particular, we show that the template-based benchmarks lack the structural diversity commonly found in the theoretical and psychological studies of language. When trained on small-scale data modeling child language acquisition, the LMs can be readily matched by simple baseline models. We advocate for the use of the readily available, carefully curated datasets that have been evaluated for gradient acceptability by large pools of native speakers and are designed to probe the structural basis of grammar specifically. On one such dataset, the LI-Adger dataset, LMs evaluate sentences in a way inconsistent with human language users. We conclude with suggestions for better connecting LMs with the empirical study of child language acquisition.

* To appear in the GenBench 2023 workshop proceedings, the first workshop on (benchmarking) generalisation in NLP. GenBench 2023 will be held at EMNLP 2023 on December 6, 2023

Via

Access Paper or Ask Questions

Exploring Linguistic Probes for Morphological Generalization

Oct 20, 2023

Jordan Kodner, Salam Khalifa, Sarah Payne

Figure 1 for Exploring Linguistic Probes for Morphological Generalization

Figure 2 for Exploring Linguistic Probes for Morphological Generalization

Figure 3 for Exploring Linguistic Probes for Morphological Generalization

Figure 4 for Exploring Linguistic Probes for Morphological Generalization

Abstract:Modern work on the cross-linguistic computational modeling of morphological inflection has typically employed language-independent data splitting algorithms. In this paper, we supplement that approach with language-specific probes designed to test aspects of morphological generalization. Testing these probes on three morphologically distinct languages, English, Spanish, and Swahili, we find evidence that three leading morphological inflection systems employ distinct generalization strategies over conjugational classes and feature sets on both orthographic and phonologically transcribed inputs.

* to appear at EMNLP 2023

Via

Access Paper or Ask Questions

Why Linguistics Will Thrive in the 21st Century: A Reply to Piantadosi

Aug 06, 2023

Jordan Kodner, Sarah Payne, Jeffrey Heinz

Abstract:We present a critical assessment of Piantadosi's (2023) claim that "Modern language models refute Chomsky's approach to language," focusing on four main points. First, despite the impressive performance and utility of large language models (LLMs), humans achieve their capacity for language after exposure to several orders of magnitude less data. The fact that young children become competent, fluent speakers of their native languages with relatively little exposure to them is the central mystery of language learning to which Chomsky initially drew attention, and LLMs currently show little promise of solving this mystery. Second, what can the artificial reveal about the natural? Put simply, the implications of LLMs for our understanding of the cognitive structures and mechanisms underlying language and its acquisition are like the implications of airplanes for understanding how birds fly. Third, LLMs cannot constitute scientific theories of language for several reasons, not least of which is that scientific theories must provide interpretable explanations, not just predictions. This leads to our final point: to even determine whether the linguistic and cognitive capabilities of LLMs rival those of humans requires explicating what humans' capacities actually are. In other words, it requires a separate theory of language and cognition; generative linguistics provides precisely such a theory. As such, we conclude that generative linguistics as a scientific discipline will remain indispensable throughout the 21st century and beyond.

Via

Access Paper or Ask Questions

Morphological Inflection: A Reality Check

May 25, 2023

Jordan Kodner, Sarah Payne, Salam Khalifa, Zoey Liu

Abstract:Morphological inflection is a popular task in sub-word NLP with both practical and cognitive applications. For years now, state-of-the-art systems have reported high, but also highly variable, performance across data sets and languages. We investigate the causes of this high performance and high variability; we find several aspects of data set creation and evaluation which systematically inflate performance and obfuscate differences between languages. To improve generalizability and reliability of results, we propose new data sampling and evaluation strategies that better reflect likely use-cases. Using these new strategies, we make new observations on the generalization abilities of current inflection systems.

* To appear at ACL 2023

Via

Access Paper or Ask Questions

The Greedy and Recursive Search for Morphological Productivity

May 12, 2021

Caleb Belth, Sarah Payne, Deniz Beser, Jordan Kodner, Charles Yang

Figure 1 for The Greedy and Recursive Search for Morphological Productivity

Figure 2 for The Greedy and Recursive Search for Morphological Productivity

Figure 3 for The Greedy and Recursive Search for Morphological Productivity

Figure 4 for The Greedy and Recursive Search for Morphological Productivity

Abstract:As children acquire the knowledge of their language's morphology, they invariably discover the productive processes that can generalize to new words. Morphological learning is made challenging by the fact that even fully productive rules have exceptions, as in the well-known case of English past tense verbs, which features the -ed rule against the irregular verbs. The Tolerance Principle is a recent proposal that provides a precise threshold of exceptions that a productive rule can withstand. Its empirical application so far, however, requires the researcher to fully specify rules defined over a set of words. We propose a greedy search model that automatically hypothesizes rules and evaluates their productivity over a vocabulary. When the search for broader productivity fails, the model recursively subdivides the vocabulary and continues the search for productivity over narrower rules. Trained on psychologically realistic data from child-directed input, our model displays developmental patterns observed in child morphology acquisition, including the notoriously complex case of German noun pluralization. It also produces responses to nonce words that, despite receiving only a fraction of the training data, are more similar to those of human subjects than current neural network models' responses are.

* CogSci 2021

Via

Access Paper or Ask Questions

Overestimation of Syntactic Representationin Neural Language Models

Apr 10, 2020

Jordan Kodner, Nitish Gupta

Figure 1 for Overestimation of Syntactic Representationin Neural Language Models

Figure 2 for Overestimation of Syntactic Representationin Neural Language Models

Figure 3 for Overestimation of Syntactic Representationin Neural Language Models

Figure 4 for Overestimation of Syntactic Representationin Neural Language Models

Abstract:With the advent of powerful neural language models over the last few years, research attention has increasingly focused on what aspects of language they represent that make them so successful. Several testing methodologies have been developed to probe models' syntactic representations. One popular method for determining a model's ability to induce syntactic structure trains a model on strings generated according to a template then tests the model's ability to distinguish such strings from superficially similar ones with different syntax. We illustrate a fundamental problem with this approach by reproducing positive results from a recent paper with two non-syntactic baseline language models: an n-gram model and an LSTM model trained on scrambled inputs.

* Accepted for publication at ACL 2020

Via

Access Paper or Ask Questions

Bootstrapping Transliteration with Constrained Discovery for Low-Resource Languages

Sep 20, 2018

Shyam Upadhyay, Jordan Kodner, Dan Roth

Figure 1 for Bootstrapping Transliteration with Constrained Discovery for Low-Resource Languages

Figure 2 for Bootstrapping Transliteration with Constrained Discovery for Low-Resource Languages

Figure 3 for Bootstrapping Transliteration with Constrained Discovery for Low-Resource Languages

Figure 4 for Bootstrapping Transliteration with Constrained Discovery for Low-Resource Languages

Abstract:Generating the English transliteration of a name written in a foreign script is an important and challenging step in multilingual knowledge acquisition and information extraction. Existing approaches to transliteration generation require a large (>5000) number of training examples. This difficulty contrasts with transliteration discovery, a somewhat easier task that involves picking a plausible transliteration from a given list. In this work, we present a bootstrapping algorithm that uses constrained discovery to improve generation, and can be used with as few as 500 training examples, which we show can be sourced from annotators in a matter of hours. This opens the task to languages for which large number of training examples are unavailable. We evaluate transliteration generation performance itself, as well the improvement it brings to cross-lingual candidate generation for entity linking, a typical downstream task. We present a comprehensive evaluation of our approach on nine languages, each written in a unique script.

* EMNLP 2018

Via

Access Paper or Ask Questions