Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

George Mulcaire

Hierarchical Character-Word Models for Language Identification

Aug 10, 2016

Aaron Jaech, George Mulcaire, Shobhit Hathi, Mari Ostendorf, Noah A. Smith

Figure 1 for Hierarchical Character-Word Models for Language Identification

Figure 2 for Hierarchical Character-Word Models for Language Identification

Figure 3 for Hierarchical Character-Word Models for Language Identification

Figure 4 for Hierarchical Character-Word Models for Language Identification

Abstract:Social media messages' brevity and unconventional spelling pose a challenge to language identification. We introduce a hierarchical model that learns character and contextualized word-level representations for language identification. Our method performs well against strong base- lines, and can also reveal code-switching.

Via

Access Paper or Ask Questions

Many Languages, One Parser

Jul 26, 2016

Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer, Noah A. Smith

Abstract:We train one multilingual model for dependency parsing and use it to parse sentences in several languages. The parsing model uses (i) multilingual word clusters and embeddings; (ii) token-level language information; and (iii) language-specific features (fine-grained POS tags). This input representation enables the parser not only to parse effectively in multiple languages, but also to generalize across languages based on linguistic universals and typological similarities, making it more effective to learn from limited annotations. Our parser's performance compares favorably to strong baselines in a range of data scenarios, including when the target language has a large treebank, a small treebank, or no treebank for training.

Via

Access Paper or Ask Questions

Massively Multilingual Word Embeddings

May 21, 2016

Waleed Ammar, George Mulcaire, Yulia Tsvetkov, Guillaume Lample, Chris Dyer, Noah A. Smith

Figure 1 for Massively Multilingual Word Embeddings

Figure 2 for Massively Multilingual Word Embeddings

Figure 3 for Massively Multilingual Word Embeddings

Abstract:We introduce new methods for estimating and evaluating embeddings of words in more than fifty languages in a single shared embedding space. Our estimation methods, multiCluster and multiCCA, use dictionaries and monolingual data; they do not require parallel data. Our new evaluation method, multiQVEC-CCA, is shown to correlate better than previous ones with two downstream tasks (text categorization and parsing). We also describe a web portal for evaluation that will facilitate further research in this area, along with open-source releases of all our methods.

Via

Access Paper or Ask Questions