Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bara' Al-Jawarneh

Neural Arabic Text Diacritization: State of the Art Results and a Novel Approach for Machine Translation

Nov 08, 2019

Ali Fadel, Ibraheem Tuffaha, Bara' Al-Jawarneh, Mahmoud Al-Ayyoub

Figure 1 for Neural Arabic Text Diacritization: State of the Art Results and a Novel Approach for Machine Translation

Figure 2 for Neural Arabic Text Diacritization: State of the Art Results and a Novel Approach for Machine Translation

Figure 3 for Neural Arabic Text Diacritization: State of the Art Results and a Novel Approach for Machine Translation

Figure 4 for Neural Arabic Text Diacritization: State of the Art Results and a Novel Approach for Machine Translation

Abstract:In this work, we present several deep learning models for the automatic diacritization of Arabic text. Our models are built using two main approaches, viz. Feed-Forward Neural Network (FFNN) and Recurrent Neural Network (RNN), with several enhancements such as 100-hot encoding, embeddings, Conditional Random Field (CRF) and Block-Normalized Gradient (BNG). The models are tested on the only freely available benchmark dataset and the results show that our models are either better or on par with other models, which require language-dependent post-processing steps, unlike ours. Moreover, we show that diacritics in Arabic can be used to enhance the models of NLP tasks such as Machine Translation (MT) by proposing the Translation over Diacritization (ToD) approach.

* 18 pages, 17 figures, 14 tables

Via

Access Paper or Ask Questions

Arabic Text Diacritization Using Deep Neural Networks

Apr 25, 2019

Ali Fadel, Ibraheem Tuffaha, Bara' Al-Jawarneh, Mahmoud Al-Ayyoub

Figure 1 for Arabic Text Diacritization Using Deep Neural Networks

Figure 2 for Arabic Text Diacritization Using Deep Neural Networks

Figure 3 for Arabic Text Diacritization Using Deep Neural Networks

Figure 4 for Arabic Text Diacritization Using Deep Neural Networks

Abstract:Diacritization of Arabic text is both an interesting and a challenging problem at the same time with various applications ranging from speech synthesis to helping students learning the Arabic language. Like many other tasks or problems in Arabic language processing, the weak efforts invested into this problem and the lack of available (open-source) resources hinder the progress towards solving this problem. This work provides a critical review for the currently existing systems, measures and resources for Arabic text diacritization. Moreover, it introduces a much-needed free-for-all cleaned dataset that can be easily used to benchmark any work on Arabic diacritization. Extracted from the Tashkeela Corpus, the dataset consists of 55K lines containing about 2.3M words. After constructing the dataset, existing tools and systems are tested on it. The results of the experiments show that the neural Shakkala system significantly outperforms traditional rule-based approaches and other closed-source tools with a Diacritic Error Rate (DER) of 2.88% compared with 13.78%, which the best DER for the non-neural approach (obtained by the Mishkal tool).

* 7 pages, 4 figures, 15 tables

Via

Access Paper or Ask Questions