Abstract:Online misogyny has become an increasing worry for Arab women who experience gender-based online abuse on a daily basis. Misogyny automatic detection systems can assist in the prohibition of anti-women Arabic toxic content. Developing such systems is hindered by the lack of the Arabic misogyny benchmark datasets. In this paper, we introduce an Arabic Levantine Twitter dataset for Misogynistic language (LeT-Mi) to be the first benchmark dataset for Arabic misogyny. We further provide a detailed review of the dataset creation and annotation phases. The consistency of the annotations for the proposed dataset was emphasized through inter-rater agreement evaluation measures. Moreover, Let-Mi was used as an evaluation dataset through binary/multi-/target classification tasks conducted by several state-of-the-art machine learning systems along with Multi-Task Learning (MTL) configuration. The obtained results indicated that the performances achieved by the used systems are consistent with state-of-the-art results for languages other than Arabic, while employing MTL improved the performance of the misogyny/target classification tasks.
Abstract:Social media reflects the public attitudes towards specific events. Events are often related to persons, locations or organizations, the so-called Named Entities. This can define Named Entities as sentiment-bearing components. In this paper, we dive beyond Named Entities recognition to the exploitation of sentiment-annotated Named Entities in Arabic sentiment analysis. Therefore, we develop an algorithm to detect the sentiment of Named Entities based on the majority of attitudes towards them. This enabled tagging Named Entities with proper tags and, thus, including them in a sentiment analysis framework of two models: supervised and lexicon-based. Both models were applied on datasets of multi-dialectal content. The results revealed that Named Entities have no considerable impact on the supervised model, while employing them in the lexicon-based model improved the classification performance and outperformed most of the baseline systems.