Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Serge Sharoff

Reading Between the Lines: A dataset and a study on why some texts are tougher than others

Jan 03, 2025

Nouran Khallaf, Carlo Eugeni, Serge Sharoff

Abstract:Our research aims at better understanding what makes a text difficult to read for specific audiences with intellectual disabilities, more specifically, people who have limitations in cognitive functioning, such as reading and understanding skills, an IQ below 70, and challenges in conceptual domains. We introduce a scheme for the annotation of difficulties which is based on empirical research in psychology as well as on research in translation studies. The paper describes the annotated dataset, primarily derived from the parallel texts (standard English and Easy to Read English translations) made available online. we fine-tuned four different pre-trained transformer models to perform the task of multiclass classification to predict the strategies required for simplification. We also investigate the possibility to interpret the decisions of this language model when it is aimed at predicting the difficulty of sentences. The resources are available from https://github.com/Nouran-Khallaf/why-tough

* Published at Writing Aids at the Crossroads of AI, Cognitive Science and NLP WR-AI-CogS, at COLING'2025, Abu Dhabi

Via

Access Paper or Ask Questions

Controlling Out-of-Domain Gaps in LLMs for Genre Classification and Generated Text Detection

Dec 29, 2024

Dmitri Roussinov, Serge Sharoff, Nadezhda Puchnina

Figure 1 for Controlling Out-of-Domain Gaps in LLMs for Genre Classification and Generated Text Detection

Figure 2 for Controlling Out-of-Domain Gaps in LLMs for Genre Classification and Generated Text Detection

Figure 3 for Controlling Out-of-Domain Gaps in LLMs for Genre Classification and Generated Text Detection

Figure 4 for Controlling Out-of-Domain Gaps in LLMs for Genre Classification and Generated Text Detection

Abstract:This study demonstrates that the modern generation of Large Language Models (LLMs, such as GPT-4) suffers from the same out-of-domain (OOD) performance gap observed in prior research on pre-trained Language Models (PLMs, such as BERT). We demonstrate this across two non-topical classification tasks: 1) genre classification and 2) generated text detection. Our results show that when demonstration examples for In-Context Learning (ICL) come from one domain (e.g., travel) and the system is tested on another domain (e.g., history), classification performance declines significantly. To address this, we introduce a method that controls which predictive indicators are used and which are excluded during classification. For the two tasks studied here, this ensures that topical features are omitted, while the model is guided to focus on stylistic rather than content-based attributes. This approach reduces the OOD gap by up to 20 percentage points in a few-shot setup. Straightforward Chain-of-Thought (CoT) methods, used as the baseline, prove insufficient, while our approach consistently enhances domain transfer performance.

* The 31st International Conference on Computational Linguistics

Via

Access Paper or Ask Questions

BERT Goes Off-Topic: Investigating the Domain Transfer Challenge using Genre Classification

Nov 27, 2023

Dmitri Roussinov, Serge Sharoff

Abstract:While performance of many text classification tasks has been recently improved due to Pre-trained Language Models (PLMs), in this paper we show that they still suffer from a performance gap when the underlying distribution of topics changes. For example, a genre classifier trained on \textit{political} topics often fails when tested on documents about \textit{sport} or \textit{medicine}. In this work, we quantify this phenomenon empirically with a large corpus and a large set of topics. Consequently, we verify that domain transfer remains challenging both for classic PLMs, such as BERT, and for modern large models, such as GPT-3. We also suggest and successfully test a possible remedy: after augmenting the training dataset with topically-controlled synthetic texts, the F1 score improves by up to 50\% for some topics, nearing on-topic training results, while others show little to no improvement. While our empirical results focus on genre classification, our methodology is applicable to other classification tasks such as gender, authorship, or sentiment classification. The code and data to replicate the experiments are available at https://github.com/dminus1/genre

* Published at EMNLP'2023

Via

Access Paper or Ask Questions

Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation

Nov 18, 2023

Nurbanu Aksoy, Serge Sharoff, Selcuk Baser, Nishant Ravikumar, Alejandro F Frangi

Abstract:Image-to-text radiology report generation aims to automatically produce radiology reports that describe the findings in medical images. Most existing methods focus solely on the image data, disregarding the other patient information accessible to radiologists. In this paper, we present a novel multi-modal deep neural network framework for generating chest X-rays reports by integrating structured patient data, such as vital signs and symptoms, alongside unstructured clinical notes.We introduce a conditioned cross-multi-head attention module to fuse these heterogeneous data modalities, bridging the semantic gap between visual and textual data. Experiments demonstrate substantial improvements from using additional modalities compared to relying on images alone. Notably, our model achieves the highest reported performance on the ROUGE-L metric compared to relevant state-of-the-art models in the literature. Furthermore, we employed both human evaluation and clinical semantic similarity measurement alongside word-overlap metrics to improve the depth of quantitative analysis. A human evaluation, conducted by a board-certified radiologist, confirms the model's accuracy in identifying high-level findings, however, it also highlights that more improvement is needed to capture nuanced details and clinical context.

Via

Access Paper or Ask Questions

Syntactic Knowledge via Graph Attention with BERT in Machine Translation

May 22, 2023

Yuqian Dai, Serge Sharoff, Marc de Kamps

Abstract:Although the Transformer model can effectively acquire context features via a self-attention mechanism, deeper syntactic knowledge is still not effectively modeled. To alleviate the above problem, we propose Syntactic knowledge via Graph attention with BERT (SGB) in Machine Translation (MT) scenarios. Graph Attention Network (GAT) and BERT jointly represent syntactic dependency feature as explicit knowledge of the source language to enrich source language representations and guide target language generation. Our experiments use gold syntax-annotation sentences and Quality Estimation (QE) model to obtain interpretability of translation quality improvement regarding syntactic knowledge without being limited to a BLEU score. Experiments show that the proposed SGB engines improve translation quality across the three MT tasks without sacrificing BLEU scores. We investigate what length of source sentences benefits the most and what dependencies are better identified by the SGB engines. We also find that learning of specific dependency relations by GAT can be reflected in the translation quality containing such relations and that syntax on the graph leads to new modeling of syntactic aspects of source sentences in the middle and bottom layers of BERT.

Via

Access Paper or Ask Questions

GATology for Linguistics: What Syntactic Dependencies It Knows

May 22, 2023

Yuqian Dai, Serge Sharoff, Marc de Kamps

Figure 1 for GATology for Linguistics: What Syntactic Dependencies It Knows

Figure 2 for GATology for Linguistics: What Syntactic Dependencies It Knows

Figure 3 for GATology for Linguistics: What Syntactic Dependencies It Knows

Figure 4 for GATology for Linguistics: What Syntactic Dependencies It Knows

Abstract:Graph Attention Network (GAT) is a graph neural network which is one of the strategies for modeling and representing explicit syntactic knowledge and can work with pre-trained models, such as BERT, in downstream tasks. Currently, there is still a lack of investigation into how GAT learns syntactic knowledge from the perspective of model structure. As one of the strategies for modeling explicit syntactic knowledge, GAT and BERT have never been applied and discussed in Machine Translation (MT) scenarios. We design a dependency relation prediction task to study how GAT learns syntactic knowledge of three languages as a function of the number of attention heads and layers. We also use a paired t-test and F1-score to clarify the differences in syntactic dependency prediction between GAT and BERT fine-tuned by the MT task (MT-B). The experiments show that better performance can be achieved by appropriately increasing the number of attention heads with two GAT layers. With more than two layers, learning suffers. Moreover, GAT is more competitive in training speed and syntactic dependency prediction than MT-B, which may reveal a better incorporation of modeling explicit syntactic knowledge and the possibility of combining GAT and BERT in the MT tasks.

Via

Access Paper or Ask Questions

Estimating Confidence of Predictions of Individual Classifiers and Their Ensembles for the Genre Classification Task

Jun 15, 2022

Mikhail Lepekhin, Serge Sharoff

Figure 1 for Estimating Confidence of Predictions of Individual Classifiers and Their Ensembles for the Genre Classification Task

Figure 2 for Estimating Confidence of Predictions of Individual Classifiers and Their Ensembles for the Genre Classification Task

Figure 3 for Estimating Confidence of Predictions of Individual Classifiers and Their Ensembles for the Genre Classification Task

Figure 4 for Estimating Confidence of Predictions of Individual Classifiers and Their Ensembles for the Genre Classification Task

Abstract:Genre identification is a subclass of non-topical text classification. The main difference between this task and topical classification is that genres, unlike topics, usually do not correspond to simple keywords, and thus they need to be defined in terms of their functions in communication. Neural models based on pre-trained transformers, such as BERT or XLM-RoBERTa, demonstrate SOTA results in many NLP tasks, including non-topical classification. However, in many cases, their downstream application to very large corpora, such as those extracted from social media, can lead to unreliable results because of dataset shifts, when some raw texts do not match the profile of the training set. To mitigate this problem, we experiment with individual models as well as with their ensembles. To evaluate the robustness of all models we use a prediction confidence metric, which estimates the reliability of a prediction in the absence of a gold standard label. We can evaluate robustness via the confidence gap between the correctly classified texts and the misclassified ones on a labeled test corpus, higher gaps make it easier to improve our confidence that our classifier made the right decision. Our results show that for all of the classifiers tested in this study, there is a confidence gap, but for the ensembles, the gap is bigger, meaning that ensembles are more robust than their individual models.

* Published at LREC, https://aclanthology.org/2022.lrec-1.642

Via

Access Paper or Ask Questions

Towards Arabic Sentence Simplification via Classification and Generative Approaches

Apr 20, 2022

Nouran Khallaf, Serge Sharoff

Figure 1 for Towards Arabic Sentence Simplification via Classification and Generative Approaches

Figure 2 for Towards Arabic Sentence Simplification via Classification and Generative Approaches

Figure 3 for Towards Arabic Sentence Simplification via Classification and Generative Approaches

Figure 4 for Towards Arabic Sentence Simplification via Classification and Generative Approaches

Abstract:This paper presents an attempt to build a Modern Standard Arabic (MSA) sentence-level simplification system. We experimented with sentence simplification using two approaches: (i) a classification approach leading to lexical simplification pipelines which use Arabic-BERT, a pre-trained contextualised model, as well as a model of fastText word embeddings; and (ii) a generative approach, a Seq2Seq technique by applying a multilingual Text-to-Text Transfer Transformer mT5. We developed our training corpus by aligning the original and simplified sentences from the internationally acclaimed Arabic novel "Saaq al-Bambuu". We evaluate effectiveness of these methods by comparing the generated simple sentences to the target simple sentences using the BERTScore evaluation metric. The simple sentences produced by the mT5 model achieve P 0.72, R 0.68 and F-1 0.70 via BERTScore, while, combining Arabic-BERT and fastText achieves P 0.97, R 0.97 and F-1 0.97. In addition, we report a manual error analysis for these experiments. \url{https://github.com/Nouran-Khallaf/Lexical_Simplification}

Via

Access Paper or Ask Questions

Experiments with adversarial attacks on text genres

Jul 05, 2021

Mikhail Lepekhin, Serge Sharoff

Figure 1 for Experiments with adversarial attacks on text genres

Figure 2 for Experiments with adversarial attacks on text genres

Figure 3 for Experiments with adversarial attacks on text genres

Figure 4 for Experiments with adversarial attacks on text genres

Abstract:Neural models based on pre-trained transformers, such as BERT or XLM-RoBERTa, demonstrate SOTA results in many NLP tasks, including non-topical classification, such as genre identification. However, often these approaches exhibit low reliability to minor alterations of the test texts. A related probelm concerns topical biases in the training corpus, for example, the prevalence of words on a specific topic in a specific genre can trick the genre classifier to recognise any text on this topic in this genre. In order to mitigate the reliability problem, this paper investigates techniques for attacking genre classifiers to understand the limitations of the transformer models and to improve their performance. While simple text attacks, such as those based on word replacement using keywords extracted by tf-idf, are not capable of deceiving powerful models like XLM-RoBERTa, we show that embedding-based algorithms which can replace some of the most ``significant'' words with words similar to them, for example, TextFooler, have the ability to influence model predictions in a significant proportion of cases.

Via

Access Paper or Ask Questions

Automatic Difficulty Classification of Arabic Sentences

Mar 07, 2021

Nouran Khallaf, Serge Sharoff

Figure 1 for Automatic Difficulty Classification of Arabic Sentences

Figure 2 for Automatic Difficulty Classification of Arabic Sentences

Figure 3 for Automatic Difficulty Classification of Arabic Sentences

Figure 4 for Automatic Difficulty Classification of Arabic Sentences

Abstract:In this paper, we present a Modern Standard Arabic (MSA) Sentence difficulty classifier, which predicts the difficulty of sentences for language learners using either the CEFR proficiency levels or the binary classification as simple or complex. We compare the use of sentence embeddings of different kinds (fastText, mBERT , XLM-R and Arabic-BERT), as well as traditional language features such as POS tags, dependency trees, readability scores and frequency lists for language learners. Our best results have been achieved using fined-tuned Arabic-BERT. The accuracy of our 3-way CEFR classification is F-1 of 0.80 and 0.75 for Arabic-Bert and XLM-R classification respectively and 0.71 Spearman correlation for regression. Our binary difficulty classifier reaches F-1 0.94 and F-1 0.98 for sentence-pair semantic similarity classifier.

* The Sixth Arabic Natural Language Processing Workshop (WANLP 2021)
* Accepted at WANLP 2021

Via

Access Paper or Ask Questions