Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Expanding the Text Classification Toolbox with Cross-Lingual Embeddings

Mar 26, 2019

Meryem M'hamdi, Robert West, Andreea Hossmann, Michael Baeriswyl, Claudiu Musat

Figure 1 for Expanding the Text Classification Toolbox with Cross-Lingual Embeddings

Figure 2 for Expanding the Text Classification Toolbox with Cross-Lingual Embeddings

Figure 3 for Expanding the Text Classification Toolbox with Cross-Lingual Embeddings

Figure 4 for Expanding the Text Classification Toolbox with Cross-Lingual Embeddings

Share this with someone who'll enjoy it:

Abstract:Most work in text classification and Natural Language Processing (NLP) focuses on English or a handful of other languages that have text corpora of hundreds of millions of words. This is creating a new version of the digital divide: the artificial intelligence (AI) divide. Transfer-based approaches, such as Cross-Lingual Text Classification (CLTC) - the task of categorizing texts written in different languages into a common taxonomy, are a promising solution to the emerging AI divide. Recent work on CLTC has focused on demonstrating the benefits of using bilingual word embeddings as features, relegating the CLTC problem to a mere benchmark based on a simple averaged perceptron. In this paper, we explore more extensively and systematically two flavors of the CLTC problem: news topic classification and textual churn intent detection (TCID) in social media. In particular, we test the hypothesis that embeddings with context are more effective, by multi-tasking the learning of multilingual word embeddings and text classification; we explore neural architectures for CLTC; and we move from bi- to multi-lingual word embeddings. For all architectures, types of word embeddings and datasets, we notice a consistent gain trend in favor of multilingual joint training, especially for low-resourced languages.

View paper on

Share this with someone who'll enjoy it:

Title:Expanding the Text Classification Toolbox with Cross-Lingual Embeddings

Paper and Code