Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhengxiang Wang

LLMs can Perform Multi-Dimensional Analytic Writing Assessments: A Case Study of L2 Graduate-Level Academic English Writing

Feb 17, 2025

Zhengxiang Wang, Veronika Makarova, Zhi Li, Jordan Kodner, Owen Rambow

Abstract:The paper explores the performance of LLMs in the context of multi-dimensional analytic writing assessments, i.e. their ability to provide both scores and comments based on multiple assessment criteria. Using a corpus of literature reviews written by L2 graduate students and assessed by human experts against 9 analytic criteria, we prompt several popular LLMs to perform the same task under various conditions. To evaluate the quality of feedback comments, we apply a novel feedback comment quality evaluation framework. This framework is interpretable, cost-efficient, scalable, and reproducible, compared to existing methods that rely on manual judgments. We find that LLMs can generate reasonably good and generally reliable multi-dimensional analytic assessments. We release our corpus for reproducibility.

* 26 pages, 6 figures, 15 tables

Via

Access Paper or Ask Questions

Evaluating LLMs with Multiple Problems at once: A New Paradigm for Probing LLM Capabilities

Jun 16, 2024

Zhengxiang Wang, Jordan Kodner, Owen Rambow

Abstract:Current LLM evaluation predominantly performs evaluation with prompts comprising single problems. We propose multi-problem evaluation as an additional approach to study the multiple problem handling capabilities of LLMs. We present a systematic study in this regard by comprehensively examining 7 LLMs on 4 related types of tasks constructed from 6 classification benchmarks. The 4 task types include traditional single-problem tasks, homogeneous multi-problem tasks, and two index selection tasks that embed the multi-problem tasks. We find that LLMs are competent multi-problem solvers: they generally perform (nearly) as well on multi-problem tasks as on single-problem tasks. Furthermore, contrary to common expectation, they often do not suffer from a positional bias with long inputs. This makes multi-problem prompting a simple and cost-efficient prompting method of practical significance. However, our results also strongly indicate that LLMs lack true understanding: they perform significantly worse in the two index selection tasks than in the multi-problem task under various evaluation settings, although they can indeed do index selection in general.

* 20 pages, 15 figures, 9 tables

Via

Access Paper or Ask Questions

Clustering Document Parts: Detecting and Characterizing Influence Campaigns From Documents

Feb 27, 2024

Zhengxiang Wang, Owen Rambow

Abstract:We propose a novel clustering pipeline to detect and characterize influence campaigns from documents. This approach clusters parts of document, detects clusters that likely reflect an influence campaign, and then identifies documents linked to an influence campaign via their association with the high-influence clusters. Our approach outperforms both the direct document-level classification and the direct document-level clustering approach in predicting if a document is part of an influence campaign. We propose various novel techniques to enhance our pipeline, including using an existing event factuality prediction system to obtain document parts, and aggregating multiple clustering experiments to improve the performance of both cluster and document classification. Classifying documents on the top of clustering not only accurately extracts the parts of the documents that are relevant to influence campaigns, but also capture influence campaigns as a coordinated and holistic phenomenon. Our approach makes possible more fine-grained and interpretable characterizations of influence campaigns from documents.

* 12 pages, 2 figures, 5 tables

Via

Access Paper or Ask Questions

Probabilistic Linguistic Knowledge and Token-level Text Augmentation

Jul 03, 2023

Zhengxiang Wang

Abstract:This paper investigates the effectiveness of token-level text augmentation and the role of probabilistic linguistic knowledge within a linguistically-motivated evaluation context. Two text augmentation programs, REDA and REDA$_{NG}$, were developed, both implementing five token-level text editing operations: Synonym Replacement (SR), Random Swap (RS), Random Insertion (RI), Random Deletion (RD), and Random Mix (RM). REDA$_{NG}$ leverages pretrained $n$-gram language models to select the most likely augmented texts from REDA's output. Comprehensive and fine-grained experiments were conducted on a binary question matching classification task in both Chinese and English. The results strongly refute the general effectiveness of the five token-level text augmentation techniques under investigation, whether applied together or separately, and irrespective of various common classification model types used, including transformers. Furthermore, the role of probabilistic linguistic knowledge is found to be minimal.

* 20 pages; 3 figures; 8 tables

Via

Access Paper or Ask Questions

Learning Transductions and Alignments with RNN Seq2seq Models

Mar 13, 2023

Zhengxiang Wang

Abstract:The paper studies the capabilities of Recurrent-Neural-Network sequence to sequence (RNN seq2seq) models in learning four string-to-string transduction tasks: identity, reversal, total reduplication, and input-specified reduplication. These transductions are traditionally well studied under finite state transducers and attributed with varying complexity. We find that RNN seq2seq models are only able to approximate a mapping that fits the training or in-distribution data. Attention helps significantly, but does not solve the out-of-distribution generalization limitation. Task complexity and RNN variants also play a role in the results. Our results are best understood in terms of the complexity hierarchy of formal languages as opposed to that of string transductions.

* 24 pages; 9 figures; 7 tables

Via

Access Paper or Ask Questions

Random Text Perturbations Work, but not Always

Sep 02, 2022

Zhengxiang Wang

Figure 1 for Random Text Perturbations Work, but not Always

Figure 2 for Random Text Perturbations Work, but not Always

Figure 3 for Random Text Perturbations Work, but not Always

Figure 4 for Random Text Perturbations Work, but not Always

Abstract:We present three large-scale experiments on binary text matching classification task both in Chinese and English to evaluate the effectiveness and generalizability of random text perturbations as a data augmentation approach for NLP. It is found that the augmentation can bring both negative and positive effects to the test set performance of three neural classification models, depending on whether the models train on enough original training examples. This remains true no matter whether five random text editing operations, used to augment text, are applied together or separately. Our study demonstrates with strong implication that the effectiveness of random text perturbations is task specific and not generally positive.

* 6 pages; 6 tables; 2 figures

Via

Access Paper or Ask Questions

Linguistic Knowledge in Data Augmentation for Natural Language Processing: An Example on Chinese Question Matching

Dec 15, 2021

Zhengxiang Wang

Figure 1 for Linguistic Knowledge in Data Augmentation for Natural Language Processing: An Example on Chinese Question Matching

Figure 2 for Linguistic Knowledge in Data Augmentation for Natural Language Processing: An Example on Chinese Question Matching

Figure 3 for Linguistic Knowledge in Data Augmentation for Natural Language Processing: An Example on Chinese Question Matching

Figure 4 for Linguistic Knowledge in Data Augmentation for Natural Language Processing: An Example on Chinese Question Matching

Abstract:To investigate the role of linguistic knowledge in data augmentation (DA) for Natural Language Processing (NLP), particularly, whether more linguistic knowledge leads to a better DA approach, we designed two adapted DA programs and applied them to LCQMC (a Large-scale Chinese Question Matching Corpus) for a binary Chinese question matching classification task. The two DA programs produce augmented texts by five simple text editing operations (or DA techniques), largely irrespective of language generation rules, but one is enhanced with a pre-trained n-gram language model to fuse it with prior linguistic knowledge. We then trained four neural network models (BOW, CNN, LSTM-RNN, and GRU-RNN) and a pre-trained model (ERNIE-Gram) on the LCQMC train sets of varying size as well as the related augmented train sets produced by the two DA programs. The test set performances of the five classification models show that adding probabilistic linguistic knowledge as constrains does not make the base DA program better, since there are no significant performance differences between the models trained on the two types of augmented train sets, both when the five DA techniques are applied together or separately. Moreover, due to the inability of the five DA techniques to make strictly paraphrastic augmented texts, the results indicate the need of sufficient amounts of training examples for the classification models trained on them to mediate the negative impact of false matching augmented text pairs and improve performances, a limitation of random text editing perturbations used a DA approach.

* 13 pages; 5 tables; 3 figures

Via

Access Paper or Ask Questions