Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation

Apr 19, 2020

Yaobo Liang, Nan Duan, Yeyun Gong, Ning Wu, Fenfei Guo, Weizhen Qi, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao(+13 more)

Figure 1 for XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation

Figure 2 for XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation

Figure 3 for XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation

Figure 4 for XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation

Share this with someone who'll enjoy it:

Abstract:In this paper, we introduce XGLUE, a new benchmark dataset to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora, and evaluate their performance across a diverse set of cross-lingual tasks. Comparing to GLUE (Wang et al.,2019), which is labeled in English and includes natural language understanding tasks only, XGLUE has three main advantages: (1) it provides two corpora with different sizes for cross-lingual pre-training; (2) it provides 11 diversified tasks that cover both natural language understanding and generation scenarios; (3) for each task, it provides labeled data in multiple languages. We extend a recent cross-lingual pre-trained model Unicoder (Huang et al., 2019) to cover both understanding and generation tasks, which is evaluated on XGLUE as a strong baseline. We also evaluate the base versions (12-layer) of Multilingual BERT, XLM and XLM-R for comparison.

View paper on

Share this with someone who'll enjoy it:

Title:XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation

Paper and Code