Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dmitri Roussinov

Large Language Models and Arabic Content: A Review

May 12, 2025

Haneh Rhel, Dmitri Roussinov

Abstract:Over the past three years, the rapid advancement of Large Language Models (LLMs) has had a profound impact on multiple areas of Artificial Intelligence (AI), particularly in Natural Language Processing (NLP) across diverse languages, including Arabic. Although Arabic is considered one of the most widely spoken languages across 27 countries in the Arabic world and used as a second language in some other non-Arabic countries as well, there is still a scarcity of Arabic resources, datasets, and tools. Arabic NLP tasks face various challenges due to the complexities of the Arabic language, including its rich morphology, intricate structure, and diverse writing standards, among other factors. Researchers have been actively addressing these challenges, demonstrating that pre-trained Large Language Models (LLMs) trained on multilingual corpora achieve significant success in various Arabic NLP tasks. This study provides an overview of using large language models (LLMs) for the Arabic language, highlighting early pre-trained Arabic Language models across various NLP applications and their ability to handle diverse Arabic content tasks and dialects. It also provides an overview of how techniques like finetuning and prompt engineering can enhance the performance of these models. Additionally, the study summarizes common Arabic benchmarks and datasets while presenting our observations on the persistent upward trend in the adoption of LLMs.

* Original language: English This paper has been submitted to the First International Conference on Artificial Intelligence and Generative AI (FICAILY 2025), and it has been accepted for presentation at FICAILY on 9-10/July 2025 and for publication in the Springer Nature. Number of pages: 16 Publication status Accepted/In press - 7 Apr 2025 https://www.gena-ai-libya2025.com/

Via

Access Paper or Ask Questions

Controlling Out-of-Domain Gaps in LLMs for Genre Classification and Generated Text Detection

Dec 29, 2024

Dmitri Roussinov, Serge Sharoff, Nadezhda Puchnina

Figure 1 for Controlling Out-of-Domain Gaps in LLMs for Genre Classification and Generated Text Detection

Figure 2 for Controlling Out-of-Domain Gaps in LLMs for Genre Classification and Generated Text Detection

Figure 3 for Controlling Out-of-Domain Gaps in LLMs for Genre Classification and Generated Text Detection

Figure 4 for Controlling Out-of-Domain Gaps in LLMs for Genre Classification and Generated Text Detection

Abstract:This study demonstrates that the modern generation of Large Language Models (LLMs, such as GPT-4) suffers from the same out-of-domain (OOD) performance gap observed in prior research on pre-trained Language Models (PLMs, such as BERT). We demonstrate this across two non-topical classification tasks: 1) genre classification and 2) generated text detection. Our results show that when demonstration examples for In-Context Learning (ICL) come from one domain (e.g., travel) and the system is tested on another domain (e.g., history), classification performance declines significantly. To address this, we introduce a method that controls which predictive indicators are used and which are excluded during classification. For the two tasks studied here, this ensures that topical features are omitted, while the model is guided to focus on stylistic rather than content-based attributes. This approach reduces the OOD gap by up to 20 percentage points in a few-shot setup. Straightforward Chain-of-Thought (CoT) methods, used as the baseline, prove insufficient, while our approach consistently enhances domain transfer performance.

* The 31st International Conference on Computational Linguistics

Via

Access Paper or Ask Questions

NLP-based Regulatory Compliance -- Using GPT 4.0 to Decode Regulatory Documents

Dec 29, 2024

Bimal Kumar, Dmitri Roussinov

Abstract:Large Language Models (LLMs) such as GPT-4.0 have shown significant promise in addressing the semantic complexities of regulatory documents, particularly in detecting inconsistencies and contradictions. This study evaluates GPT-4.0's ability to identify conflicts within regulatory requirements by analyzing a curated corpus with artificially injected ambiguities and contradictions, designed in collaboration with architects and compliance engineers. Using metrics such as precision, recall, and F1 score, the experiment demonstrates GPT-4.0's effectiveness in detecting inconsistencies, with findings validated by human experts. The results highlight the potential of LLMs to enhance regulatory compliance processes, though further testing with larger datasets and domain-specific fine-tuning is needed to maximize accuracy and practical applicability. Future work will explore automated conflict resolution and real-world implementation through pilot projects with industry partners.

* accepted for presentation at Georg Nemetschek Institute Symposium & Expo on Artificial Intelligence for the Built World - Munich, Germany. 12 Sept 2024

Via

Access Paper or Ask Questions

BERT Goes Off-Topic: Investigating the Domain Transfer Challenge using Genre Classification

Nov 27, 2023

Dmitri Roussinov, Serge Sharoff

Abstract:While performance of many text classification tasks has been recently improved due to Pre-trained Language Models (PLMs), in this paper we show that they still suffer from a performance gap when the underlying distribution of topics changes. For example, a genre classifier trained on \textit{political} topics often fails when tested on documents about \textit{sport} or \textit{medicine}. In this work, we quantify this phenomenon empirically with a large corpus and a large set of topics. Consequently, we verify that domain transfer remains challenging both for classic PLMs, such as BERT, and for modern large models, such as GPT-3. We also suggest and successfully test a possible remedy: after augmenting the training dataset with topically-controlled synthetic texts, the F1 score improves by up to 50\% for some topics, nearing on-topic training results, while others show little to no improvement. While our empirical results focus on genre classification, our methodology is applicable to other classification tasks such as gender, authorship, or sentiment classification. The code and data to replicate the experiments are available at https://github.com/dminus1/genre

* Published at EMNLP'2023

Via

Access Paper or Ask Questions