Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ahmed H. Yousef

ANER: Arabic and Arabizi Named Entity Recognition using Transformer-Based Approach

Aug 28, 2023

Abdelrahman "Boda" Sadallah, Omar Ahmed, Shimaa Mohamed, Omar Hatem, Doaa Hesham, Ahmed H. Yousef

Abstract:One of the main tasks of Natural Language Processing (NLP), is Named Entity Recognition (NER). It is used in many applications and also can be used as an intermediate step for other tasks. We present ANER, a web-based named entity recognizer for the Arabic, and Arabizi languages. The model is built upon BERT, which is a transformer-based encoder. It can recognize 50 different entity classes, covering various fields. We trained our model on the WikiFANE\_Gold dataset which consists of Wikipedia articles. We achieved an F1 score of 88.7\%, which beats CAMeL Tools' F1 score of 83\% on the ANERcorp dataset, which has only 4 classes. We also got an F1 score of 77.7\% on the NewsFANE\_Gold dataset which contains out-of-domain data from News articles. The system is deployed on a user-friendly web interface that accepts users' inputs in Arabic, or Arabizi. It allows users to explore the entities in the text by highlighting them. It can also direct users to get information about entities through Wikipedia directly. We added the ability to do NER using our model, or CAMeL Tools' model through our website. ANER is publicly accessible at \url{http://www.aner.online}. We also deployed our model on HuggingFace at https://huggingface.co/boda/ANER, to allow developers to test and use it.

Via

Access Paper or Ask Questions

Egyptian Dialect Stopword List Generation from Social Network Data

Apr 13, 2015

Walaa Medhat, Ahmed H. Yousef, Hoda Korashy

Figure 1 for Egyptian Dialect Stopword List Generation from Social Network Data

Figure 2 for Egyptian Dialect Stopword List Generation from Social Network Data

Figure 3 for Egyptian Dialect Stopword List Generation from Social Network Data

Figure 4 for Egyptian Dialect Stopword List Generation from Social Network Data

Abstract:This paper proposes a methodology for generating a stopword list from online social network (OSN) corpora in Egyptian Dialect(ED). The aim of the paper is to investigate the effect of removingED stopwords on the Sentiment Analysis (SA) task. The stopwords lists generated before were on Modern Standard Arabic (MSA) which is not the common language used in OSN. We have generated a stopword list of Egyptian dialect to be used with the OSN corpora. We compare the efficiency of text classification when using the generated list along with previously generated lists of MSA and combining the Egyptian dialect list with the MSA list. The text classification was performed using Na\"ive Bayes and Decision Tree classifiers and two feature selection approaches, unigram and bigram. The experiments show that removing ED stopwords give better performance than using lists of MSA stopwords only.

* The paper is an extension to the old paper found in the language engineering conference, arXiv:1410.1135. It is accepted by the language engineeringjournal. Although it has nearly the same structure, it is different because extensive cross validation is added any many negation words are added to dataset of the paper

Via

Access Paper or Ask Questions

Corpora Preparation and Stopword List Generation for Arabic data in Social Network

Oct 05, 2014

Walaa Medhat, Ahmed H. Yousef, Hoda Korashy

Figure 1 for Corpora Preparation and Stopword List Generation for Arabic data in Social Network

Figure 2 for Corpora Preparation and Stopword List Generation for Arabic data in Social Network

Figure 3 for Corpora Preparation and Stopword List Generation for Arabic data in Social Network

Figure 4 for Corpora Preparation and Stopword List Generation for Arabic data in Social Network

Abstract:This paper proposes a methodology to prepare corpora in Arabic language from online social network (OSN) and review site for Sentiment Analysis (SA) task. The paper also proposes a methodology for generating a stopword list from the prepared corpora. The aim of the paper is to investigate the effect of removing stopwords on the SA task. The problem is that the stopwords lists generated before were on Modern Standard Arabic (MSA) which is not the common language used in OSN. We have generated a stopword list of Egyptian dialect and a corpus-based list to be used with the OSN corpora. We compare the efficiency of text classification when using the generated lists along with previously generated lists of MSA and combining the Egyptian dialect list with the MSA list. The text classification was performed using Na\"ive Bayes and Decision Tree classifiers and two feature selection approaches, unigrams and bigram. The experiments show that the general lists containing the Egyptian dialects words give better performance than using lists of MSA stopwords only.

* Language Engineering Conference 2014, Cairo, Egypt, 1-3 December 2014

Via

Access Paper or Ask Questions

Cross-Language Personal Name Mapping

May 24, 2014

Ahmed H. Yousef

Figure 1 for Cross-Language Personal Name Mapping

Figure 2 for Cross-Language Personal Name Mapping

Figure 3 for Cross-Language Personal Name Mapping

Figure 4 for Cross-Language Personal Name Mapping

Abstract:Name matching between multiple natural languages is an important step in cross-enterprise integration applications and data mining. It is difficult to decide whether or not two syntactic values (names) from two heterogeneous data sources are alternative designation of the same semantic entity (person), this process becomes more difficult with Arabic language due to several factors including spelling and pronunciation variation, dialects and special vowel and consonant distinction and other linguistic characteristics. This paper proposes a new framework for name matching between the Arabic language and other languages. The framework uses a dictionary based on a new proposed version of the Soundex algorithm to encapsulate the recognition of special features of Arabic names. The framework proposes a new proximity matching algorithm to suit the high importance of order sensitivity in Arabic name matching. New performance evaluation metrics are proposed as well. The framework is implemented and verified empirically in several case studies demonstrating substantial improvements compared to other well-known techniques found in literature.

* International Journal of Computational Linguistics Research, vol 4, issue 4, December 2013

Via

Access Paper or Ask Questions