Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Arabic Dialect Identification under Scrutiny: Limitations of Single-label Classification

Oct 20, 2023

Amr Keleg, Walid Magdy

Figure 1 for Arabic Dialect Identification under Scrutiny: Limitations of Single-label Classification

Figure 2 for Arabic Dialect Identification under Scrutiny: Limitations of Single-label Classification

Figure 3 for Arabic Dialect Identification under Scrutiny: Limitations of Single-label Classification

Figure 4 for Arabic Dialect Identification under Scrutiny: Limitations of Single-label Classification

Share this with someone who'll enjoy it:

Abstract:Automatic Arabic Dialect Identification (ADI) of text has gained great popularity since it was introduced in the early 2010s. Multiple datasets were developed, and yearly shared tasks have been running since 2018. However, ADI systems are reported to fail in distinguishing between the micro-dialects of Arabic. We argue that the currently adopted framing of the ADI task as a single-label classification problem is one of the main reasons for that. We highlight the limitation of the incompleteness of the Dialect labels and demonstrate how it impacts the evaluation of ADI systems. A manual error analysis for the predictions of an ADI, performed by 7 native speakers of different Arabic dialects, revealed that $\approx$ 66% of the validated errors are not true errors. Consequently, we propose framing ADI as a multi-label classification task and give recommendations for designing new ADI datasets.

* Accepted to the ArabicNLP 2023 conference co-located with EMNLP 2023

View paper on

Share this with someone who'll enjoy it:

Title:Arabic Dialect Identification under Scrutiny: Limitations of Single-label Classification

Paper and Code