Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hend Al-Khalifa

The Prompting Brain: Neurocognitive Markers of Expertise in Guiding Large Language Models

Aug 20, 2025

Hend Al-Khalifa, Raneem Almansour, Layan Abdulrahman Alhuasini, Alanood Alsaleh, Mohamad-Hani Temsah, Mohamad-Hani_Temsah, Ashwag Rafea S Alruwaili

Abstract:Prompt engineering has rapidly emerged as a critical skill for effective interaction with large language models (LLMs). However, the cognitive and neural underpinnings of this expertise remain largely unexplored. This paper presents findings from a cross-sectional pilot fMRI study investigating differences in brain functional connectivity and network activity between experts and intermediate prompt engineers. Our results reveal distinct neural signatures associated with higher prompt engineering literacy, including increased functional connectivity in brain regions such as the left middle temporal gyrus and the left frontal pole, as well as altered power-frequency dynamics in key cognitive networks. These findings offer initial insights into the neurobiological basis of prompt engineering proficiency. We discuss the implications of these neurocognitive markers in Natural Language Processing (NLP). Understanding the neural basis of human expertise in interacting with LLMs can inform the design of more intuitive human-AI interfaces, contribute to cognitive models of LLM interaction, and potentially guide the development of AI systems that better align with human cognitive workflows. This interdisciplinary approach aims to bridge the gap between human cognition and machine intelligence, fostering a deeper understanding of how humans learn and adapt to complex AI systems.

Via

Access Paper or Ask Questions

MultiProSE: A Multi-label Arabic Dataset for Propaganda, Sentiment, and Emotion Detection

Feb 12, 2025

Lubna Al-Henaki, Hend Al-Khalifa, Abdulmalik Al-Salman, Hajar Alqubayshi, Hind Al-Twailay, Gheeda Alghamdi, Hawra Aljasim

Figure 1 for MultiProSE: A Multi-label Arabic Dataset for Propaganda, Sentiment, and Emotion Detection

Figure 2 for MultiProSE: A Multi-label Arabic Dataset for Propaganda, Sentiment, and Emotion Detection

Figure 3 for MultiProSE: A Multi-label Arabic Dataset for Propaganda, Sentiment, and Emotion Detection

Figure 4 for MultiProSE: A Multi-label Arabic Dataset for Propaganda, Sentiment, and Emotion Detection

Abstract:Propaganda is a form of persuasion that has been used throughout history with the intention goal of influencing people's opinions through rhetorical and psychological persuasion techniques for determined ends. Although Arabic ranked as the fourth most- used language on the internet, resources for propaganda detection in languages other than English, especially Arabic, remain extremely limited. To address this gap, the first Arabic dataset for Multi-label Propaganda, Sentiment, and Emotion (MultiProSE) has been introduced. MultiProSE is an open-source extension of the existing Arabic propaganda dataset, ArPro, with the addition of sentiment and emotion annotations for each text. This dataset comprises 8,000 annotated news articles, which is the largest propaganda dataset to date. For each task, several baselines have been developed using large language models (LLMs), such as GPT-4o-mini, and pre-trained language models (PLMs), including three BERT-based models. The dataset, annotation guidelines, and source code are all publicly released to facilitate future research and development in Arabic language models and contribute to a deeper understanding of how various opinion dimensions interact in news media1.

* 12 pages, 3 figuers, 4 tabels

Via

Access Paper or Ask Questions

GLARE: Google Apps Arabic Reviews Dataset

Dec 16, 2024

Fatima AlGhamdi, Reem Mohammed, Hend Al-Khalifa, Areeb Alowisheq

Figure 1 for GLARE: Google Apps Arabic Reviews Dataset

Figure 2 for GLARE: Google Apps Arabic Reviews Dataset

Figure 3 for GLARE: Google Apps Arabic Reviews Dataset

Figure 4 for GLARE: Google Apps Arabic Reviews Dataset

Abstract:This paper introduces GLARE an Arabic Apps Reviews dataset collected from Saudi Google PlayStore. It consists of 76M reviews, 69M of which are Arabic reviews of 9,980 Android Applications. We present the data collection methodology, along with a detailed Exploratory Data Analysis (EDA) and Feature Engineering on the gathered reviews. We also highlight possible use cases and benefits of the dataset.

* Github Repo: https://github.com/Fatima-Gh/GLARE Zenodo: https://zenodo.org/records/6457824

Via

Access Paper or Ask Questions

A Survey of Large Language Models for Arabic Language and its Dialects

Oct 26, 2024

Malak Mashaabi, Shahad Al-Khalifa, Hend Al-Khalifa

Figure 1 for A Survey of Large Language Models for Arabic Language and its Dialects

Figure 2 for A Survey of Large Language Models for Arabic Language and its Dialects

Figure 3 for A Survey of Large Language Models for Arabic Language and its Dialects

Figure 4 for A Survey of Large Language Models for Arabic Language and its Dialects

Abstract:This survey offers a comprehensive overview of Large Language Models (LLMs) designed for Arabic language and its dialects. It covers key architectures, including encoder-only, decoder-only, and encoder-decoder models, along with the datasets used for pre-training, spanning Classical Arabic, Modern Standard Arabic, and Dialectal Arabic. The study also explores monolingual, bilingual, and multilingual LLMs, analyzing their architectures and performance across downstream tasks, such as sentiment analysis, named entity recognition, and question answering. Furthermore, it assesses the openness of Arabic LLMs based on factors, such as source code availability, training data, model weights, and documentation. The survey highlights the need for more diverse dialectal datasets and attributes the importance of openness for research reproducibility and transparency. It concludes by identifying key challenges and opportunities for future research and stressing the need for more inclusive and representative models.

Via

Access Paper or Ask Questions

CLEANANERCorp: Identifying and Correcting Incorrect Labels in the ANERcorp Dataset

Aug 22, 2024

Mashael Al-Duwais, Hend Al-Khalifa, Abdulmalik Al-Salman

Figure 1 for CLEANANERCorp: Identifying and Correcting Incorrect Labels in the ANERcorp Dataset

Figure 2 for CLEANANERCorp: Identifying and Correcting Incorrect Labels in the ANERcorp Dataset

Figure 3 for CLEANANERCorp: Identifying and Correcting Incorrect Labels in the ANERcorp Dataset

Figure 4 for CLEANANERCorp: Identifying and Correcting Incorrect Labels in the ANERcorp Dataset

Abstract:Label errors are a common issue in machine learning datasets, particularly for tasks such as Named Entity Recognition. Such label errors might hurt model training, affect evaluation results, and lead to an inaccurate assessment of model performance. In this study, we dived deep into one of the widely adopted Arabic NER benchmark datasets (ANERcorp) and found a significant number of annotation errors, missing labels, and inconsistencies. Therefore, in this study, we conducted empirical research to understand these errors, correct them and propose a cleaner version of the dataset named CLEANANERCorp. CLEANANERCorp will serve the research community as a more accurate and consistent benchmark.

* ELRA and ICCL 2024
* Proceedings of the 6th Workshop on Open-Source Arabic Corpora and Processing Tools (OSACT) with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation @ LREC-COLING 2024

Via

Access Paper or Ask Questions

The Qiyas Benchmark: Measuring ChatGPT Mathematical and Language Understanding in Arabic

Jun 28, 2024

Shahad Al-Khalifa, Hend Al-Khalifa

Figure 1 for The Qiyas Benchmark: Measuring ChatGPT Mathematical and Language Understanding in Arabic

Figure 2 for The Qiyas Benchmark: Measuring ChatGPT Mathematical and Language Understanding in Arabic

Figure 3 for The Qiyas Benchmark: Measuring ChatGPT Mathematical and Language Understanding in Arabic

Figure 4 for The Qiyas Benchmark: Measuring ChatGPT Mathematical and Language Understanding in Arabic

Abstract:Despite the growing importance of Arabic as a global language, there is a notable lack of language models pre-trained exclusively on Arabic data. This shortage has led to limited benchmarks available for assessing language model performance in Arabic. To address this gap, we introduce two novel benchmarks designed to evaluate models' mathematical reasoning and language understanding abilities in Arabic. These benchmarks are derived from a General Aptitude Test (GAT) called Qiyas exam, a standardized test widely used for university admissions in Saudi Arabia. For validation purposes, we assess the performance of ChatGPT-3.5-trubo and ChatGPT-4 on our benchmarks. Our findings reveal that these benchmarks pose a significant challenge, with ChatGPT-4 achieving an overall average accuracy of 64%, while ChatGPT-3.5-trubo achieved an overall accuracy of 49% across the various question types in the Qiyas benchmark. We believe the release of these benchmarks will pave the way for enhancing the mathematical reasoning and language understanding capabilities of future models tailored for the low-resource Arabic language.

Via

Access Paper or Ask Questions

Towards Designing a ChatGPT Conversational Companion for Elderly People

Apr 18, 2023

Abeer Alessa, Hend Al-Khalifa

Figure 1 for Towards Designing a ChatGPT Conversational Companion for Elderly People

Figure 2 for Towards Designing a ChatGPT Conversational Companion for Elderly People

Figure 3 for Towards Designing a ChatGPT Conversational Companion for Elderly People

Figure 4 for Towards Designing a ChatGPT Conversational Companion for Elderly People

Abstract:Loneliness and social isolation are serious and widespread problems among older people, affecting their physical and mental health, quality of life, and longevity. In this paper, we propose a ChatGPT-based conversational companion system for elderly people. The system is designed to provide companionship and help reduce feelings of loneliness and social isolation. The system was evaluated with a preliminary study. The results showed that the system was able to generate responses that were relevant to the created elderly personas. However, it is essential to acknowledge the limitations of ChatGPT, such as potential biases and misinformation, and to consider the ethical implications of using AI-based companionship for the elderly, including privacy concerns.

* 10 pages, 3 Figures, Workshop paper

Via

Access Paper or Ask Questions

The Saudi Privacy Policy Dataset

Apr 05, 2023

Hend Al-Khalifa, Malak Mashaabi, Ghadi Al-Yahya, Raghad Alnashwan

Figure 1 for The Saudi Privacy Policy Dataset

Figure 2 for The Saudi Privacy Policy Dataset

Figure 3 for The Saudi Privacy Policy Dataset

Figure 4 for The Saudi Privacy Policy Dataset

Abstract:This paper introduces the Saudi Privacy Policy Dataset, a diverse compilation of Arabic privacy policies from various sectors in Saudi Arabia, annotated according to the 10 principles of the Personal Data Protection Law (PDPL); the PDPL was established to be compatible with General Data Protection Regulation (GDPR); one of the most comprehensive data regulations worldwide. Data were collected from multiple sources, including the Saudi Central Bank, the Saudi Arabia National United Platform, the Council of Health Insurance, and general websites using Google and Wikipedia. The final dataset includes 1,000 websites belonging to 7 sectors, 4,638 lines of text, 775,370 tokens, and a corpus size of 8,353 KB. The annotated dataset offers significant reuse potential for assessing privacy policy compliance, benchmarking privacy practices across industries, and developing automated tools for monitoring adherence to data protection regulations. By providing a comprehensive and annotated dataset of privacy policies, this paper aims to facilitate further research and development in the areas of privacy policy analysis, natural language processing, and machine learning applications related to privacy and data protection, while also serving as an essential resource for researchers, policymakers, and industry professionals interested in understanding and promoting compliance with privacy regulations in Saudi Arabia.

* 8 pages, 1 figure

Via

Access Paper or Ask Questions

Natural Language Processing in Customer Service: A Systematic Review

Dec 16, 2022

Malak Mashaabi, Areej Alotaibi, Hala Qudaih, Raghad Alnashwan, Hend Al-Khalifa

Abstract:Artificial intelligence and natural language processing (NLP) are increasingly being used in customer service to interact with users and answer their questions. The goal of this systematic review is to examine existing research on the use of NLP technology in customer service, including the research domain, applications, datasets used, and evaluation methods. The review also looks at the future direction of the field and any significant limitations. The review covers the time period from 2015 to 2022 and includes papers from five major scientific databases. Chatbots and question-answering systems were found to be used in 10 main fields, with the most common use in general, social networking, and e-commerce areas. Twitter was the second most commonly used dataset, with most research also using their own original datasets. Accuracy, precision, recall, and F1 were the most common evaluation methods. Future work aims to improve the performance and understanding of user behavior and emotions, and address limitations such as the volume, diversity, and quality of datasets. This review includes research on different spoken languages and models and techniques.

Via

Access Paper or Ask Questions

Handwritten Arabic Character Recognition for Children Writ-ing Using Convolutional Neural Network and Stroke Identification

Nov 03, 2022

Mais Alheraki, Rawan Al-Matham, Hend Al-Khalifa

Abstract:Automatic Arabic handwritten recognition is one of the recently studied problems in the field of Machine Learning. Unlike Latin languages, Arabic is a Semitic language that forms a harder challenge, especially with variability of patterns caused by factors such as writer age. Most of the studies focused on adults, with only one recent study on children. Moreover, much of the recent Machine Learning methods focused on using Convolutional Neural Networks, a powerful class of neural networks that can extract complex features from images. In this paper we propose a convolutional neural network (CNN) model that recognizes children handwriting with an accuracy of 91% on the Hijja dataset, a recent dataset built by collecting images of the Arabic characters written by children, and 97% on Arabic Handwritten Character Dataset. The results showed a good improvement over the proposed model from the Hijja dataset authors, yet it reveals a bigger challenge to solve for children Arabic handwritten character recognition. Moreover, we proposed a new approach using multi models instead of single model based on the number of strokes in a character, and merged Hijja with AHCD which reached an averaged prediction accuracy of 96%.

* 17

Via

Access Paper or Ask Questions