Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mattia Di Gangi

Take the Hint: Improving Arabic Diacritization with Partially-Diacritized Text

Jun 06, 2023

Parnia Bahar, Mattia Di Gangi, Nick Rossenbach, Mohammad Zeineldeen

Figure 1 for Take the Hint: Improving Arabic Diacritization with Partially-Diacritized Text

Figure 2 for Take the Hint: Improving Arabic Diacritization with Partially-Diacritized Text

Figure 3 for Take the Hint: Improving Arabic Diacritization with Partially-Diacritized Text

Abstract:Automatic Arabic diacritization is useful in many applications, ranging from reading support for language learners to accurate pronunciation predictor for downstream tasks like speech synthesis. While most of the previous works focused on models that operate on raw non-diacritized text, production systems can gain accuracy by first letting humans partly annotate ambiguous words. In this paper, we propose 2SDiac, a multi-source model that can effectively support optional diacritics in input to inform all predictions. We also introduce Guided Learning, a training scheme to leverage given diacritics in input with different levels of random masking. We show that the provided hints during test affect more output positions than those annotated. Moreover, experiments on two common benchmarks show that our approach i) greatly outperforms the baseline also when evaluated on non-diacritized text; and ii) achieves state-of-the-art results while reducing the parameter count by over 60%.

* Arabic text diacritization, partially-diacritized text, Arabic natural language processing

Via

Access Paper or Ask Questions

Controlling the Output Length of Neural Machine Translation

Oct 25, 2019

Surafel Melaku Lakew, Mattia Di Gangi, Marcello Federico

Figure 1 for Controlling the Output Length of Neural Machine Translation

Figure 2 for Controlling the Output Length of Neural Machine Translation

Figure 3 for Controlling the Output Length of Neural Machine Translation

Figure 4 for Controlling the Output Length of Neural Machine Translation

Abstract:The recent advances introduced by neural machine translation (NMT) are rapidly expanding the application fields of machine translation, as well as reshaping the quality level to be targeted. In particular, if translations have to fit some given layout, quality should not only be measured in terms of adequacy and fluency, but also length. Exemplary cases are the translation of document files, subtitles, and scripts for dubbing, where the output length should ideally be as close as possible to the length of the input text. This paper addresses for the first time, to the best of our knowledge, the problem of controlling the output length in NMT. We investigate two methods for biasing the output length with a transformer architecture: i) conditioning the output to a given target-source length-ratio class and ii) enriching the transformer positional embedding with length information. Our experiments show that both methods can induce the network to generate shorter translations, as well as acquiring interpretable linguistic skills.

* To appear at the 16th International Workshop on Spoken Language Translation (IWSLT), 2019

Via

Access Paper or Ask Questions