Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:A Focused Study to Compare Arabic Pre-training Models on Newswire IE Tasks

Apr 30, 2020

Wuwei Lan, Yang Chen, Wei Xu, Alan Ritter

Figure 1 for A Focused Study to Compare Arabic Pre-training Models on Newswire IE Tasks

Figure 2 for A Focused Study to Compare Arabic Pre-training Models on Newswire IE Tasks

Share this with someone who'll enjoy it:

Abstract:The Arabic language is a morphological rich language, posing many challenges for information extraction (IE) tasks, including Named Entity Recognition (NER), Part-of-Speech tagging (POS), Argument Role Labeling (ARL) and Relation Extraction (RE). A few multilingual pre-trained models have been proposed and show good performance for Arabic, however, most experiment results are reported on language understanding tasks, such as natural language inference, question answering and sentiment analysis. Their performance on the IE tasks is less known, in particular, the cross-lingual transfer capability from English to Arabic. In this work, we pre-train a Gigaword-based bilingual language model (GigaBERT) to study these two distant languages as well as zero-short transfer learning on the information extraction tasks. Our GigaBERT model can outperform mBERT and XLM-R-base on NER, POS and ARL tasks, with regarding to the per-language and/or zero-transfer performance. We make our pre-trained models publicly available at https://github.com/lanwuwei/GigaBERT to facilitate the research of this field.

View paper on

Share this with someone who'll enjoy it:

Title:A Focused Study to Compare Arabic Pre-training Models on Newswire IE Tasks

Paper and Code