Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nursultan Askarbekuly

Quranic Audio Dataset: Crowdsourced and Labeled Recitation from Non-Arabic Speakers

May 04, 2024

Raghad Salameh, Mohamad Al Mdfaa, Nursultan Askarbekuly, Manuel Mazzara

Figure 1 for Quranic Audio Dataset: Crowdsourced and Labeled Recitation from Non-Arabic Speakers

Figure 2 for Quranic Audio Dataset: Crowdsourced and Labeled Recitation from Non-Arabic Speakers

Figure 3 for Quranic Audio Dataset: Crowdsourced and Labeled Recitation from Non-Arabic Speakers

Figure 4 for Quranic Audio Dataset: Crowdsourced and Labeled Recitation from Non-Arabic Speakers

Abstract:This paper addresses the challenge of learning to recite the Quran for non-Arabic speakers. We explore the possibility of crowdsourcing a carefully annotated Quranic dataset, on top of which AI models can be built to simplify the learning process. In particular, we use the volunteer-based crowdsourcing genre and implement a crowdsourcing API to gather audio assets. We integrated the API into an existing mobile application called NamazApp to collect audio recitations. We developed a crowdsourcing platform called Quran Voice for annotating the gathered audio assets. As a result, we have collected around 7000 Quranic recitations from a pool of 1287 participants across more than 11 non-Arabic countries, and we have annotated 1166 recitations from the dataset in six categories. We have achieved a crowd accuracy of 0.77, an inter-rater agreement of 0.63 between the annotators, and 0.89 between the labels assigned by the algorithm and the expert judgments.

Via

Access Paper or Ask Questions