Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nouran Khallaf

Reading Between the Lines: A dataset and a study on why some texts are tougher than others

Jan 03, 2025

Nouran Khallaf, Carlo Eugeni, Serge Sharoff

Abstract:Our research aims at better understanding what makes a text difficult to read for specific audiences with intellectual disabilities, more specifically, people who have limitations in cognitive functioning, such as reading and understanding skills, an IQ below 70, and challenges in conceptual domains. We introduce a scheme for the annotation of difficulties which is based on empirical research in psychology as well as on research in translation studies. The paper describes the annotated dataset, primarily derived from the parallel texts (standard English and Easy to Read English translations) made available online. we fine-tuned four different pre-trained transformer models to perform the task of multiclass classification to predict the strategies required for simplification. We also investigate the possibility to interpret the decisions of this language model when it is aimed at predicting the difficulty of sentences. The resources are available from https://github.com/Nouran-Khallaf/why-tough

* Published at Writing Aids at the Crossroads of AI, Cognitive Science and NLP WR-AI-CogS, at COLING'2025, Abu Dhabi

Via

Access Paper or Ask Questions

Towards Arabic Sentence Simplification via Classification and Generative Approaches

Apr 20, 2022

Nouran Khallaf, Serge Sharoff

Figure 1 for Towards Arabic Sentence Simplification via Classification and Generative Approaches

Figure 2 for Towards Arabic Sentence Simplification via Classification and Generative Approaches

Figure 3 for Towards Arabic Sentence Simplification via Classification and Generative Approaches

Figure 4 for Towards Arabic Sentence Simplification via Classification and Generative Approaches

Abstract:This paper presents an attempt to build a Modern Standard Arabic (MSA) sentence-level simplification system. We experimented with sentence simplification using two approaches: (i) a classification approach leading to lexical simplification pipelines which use Arabic-BERT, a pre-trained contextualised model, as well as a model of fastText word embeddings; and (ii) a generative approach, a Seq2Seq technique by applying a multilingual Text-to-Text Transfer Transformer mT5. We developed our training corpus by aligning the original and simplified sentences from the internationally acclaimed Arabic novel "Saaq al-Bambuu". We evaluate effectiveness of these methods by comparing the generated simple sentences to the target simple sentences using the BERTScore evaluation metric. The simple sentences produced by the mT5 model achieve P 0.72, R 0.68 and F-1 0.70 via BERTScore, while, combining Arabic-BERT and fastText achieves P 0.97, R 0.97 and F-1 0.97. In addition, we report a manual error analysis for these experiments. \url{https://github.com/Nouran-Khallaf/Lexical_Simplification}

Via

Access Paper or Ask Questions

Automatic Difficulty Classification of Arabic Sentences

Mar 07, 2021

Nouran Khallaf, Serge Sharoff

Figure 1 for Automatic Difficulty Classification of Arabic Sentences

Figure 2 for Automatic Difficulty Classification of Arabic Sentences

Figure 3 for Automatic Difficulty Classification of Arabic Sentences

Figure 4 for Automatic Difficulty Classification of Arabic Sentences

Abstract:In this paper, we present a Modern Standard Arabic (MSA) Sentence difficulty classifier, which predicts the difficulty of sentences for language learners using either the CEFR proficiency levels or the binary classification as simple or complex. We compare the use of sentence embeddings of different kinds (fastText, mBERT , XLM-R and Arabic-BERT), as well as traditional language features such as POS tags, dependency trees, readability scores and frequency lists for language learners. Our best results have been achieved using fined-tuned Arabic-BERT. The accuracy of our 3-way CEFR classification is F-1 of 0.80 and 0.75 for Arabic-Bert and XLM-R classification respectively and 0.71 Spearman correlation for regression. Our binary difficulty classifier reaches F-1 0.94 and F-1 0.98 for sentence-pair semantic similarity classifier.

* The Sixth Arabic Natural Language Processing Workshop (WANLP 2021)
* Accepted at WANLP 2021

Via

Access Paper or Ask Questions