Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages with Adversarial Examples

Apr 17, 2021

Qianchu Liu, Edoardo M. Ponti, Diana McCarthy, Ivan Vulić, Anna Korhonen

Figure 1 for AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages with Adversarial Examples

Figure 2 for AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages with Adversarial Examples

Figure 3 for AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages with Adversarial Examples

Figure 4 for AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages with Adversarial Examples

Share this with someone who'll enjoy it:

Abstract:Capturing word meaning in context and distinguishing between correspondences and variations across languages is key to building successful multilingual and cross-lingual text representation models. However, existing multilingual evaluation datasets that evaluate lexical semantics "in-context" have various limitations, in particular, (1) their language coverage is restricted to high-resource languages and skewed in favor of only a few language families and areas, (2) a design that makes the task solvable via superficial cues, which results in artificially inflated (and sometimes super-human) performances of pretrained encoders, on many target languages, which limits their usefulness for model probing and diagnostics, and (3) no support for cross-lingual evaluation. In order to address these gaps, we present AM2iCo, Adversarial and Multilingual Meaning in Context, a wide-coverage cross-lingual and multilingual evaluation set; it aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts for 14 language pairs. We conduct a series of experiments in a wide range of setups and demonstrate the challenging nature of AM2iCo. The results reveal that current SotA pretrained encoders substantially lag behind human performance, and the largest gaps are observed for low-resource languages and languages dissimilar to English.

View paper on

Share this with someone who'll enjoy it:

Title:AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages with Adversarial Examples

Paper and Code