Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Denis Teslenko

Word Sense Disambiguation for 158 Languages using Word Embeddings Only

Mar 14, 2020

Varvara Logacheva, Denis Teslenko, Artem Shelmanov, Steffen Remus, Dmitry Ustalov, Andrey Kutuzov, Ekaterina Artemova, Chris Biemann, Simone Paolo Ponzetto, Alexander Panchenko

Figure 1 for Word Sense Disambiguation for 158 Languages using Word Embeddings Only

Figure 2 for Word Sense Disambiguation for 158 Languages using Word Embeddings Only

Figure 3 for Word Sense Disambiguation for 158 Languages using Word Embeddings Only

Figure 4 for Word Sense Disambiguation for 158 Languages using Word Embeddings Only

Abstract:Disambiguation of word senses in context is easy for humans, but is a major challenge for automatic approaches. Sophisticated supervised and knowledge-based models were developed to solve this task. However, (i) the inherent Zipfian distribution of supervised training instances for a given word and/or (ii) the quality of linguistic knowledge representations motivate the development of completely unsupervised and knowledge-free approaches to word sense disambiguation (WSD). They are particularly useful for under-resourced languages which do not have any resources for building either supervised and/or knowledge-based models. In this paper, we present a method that takes as input a standard pre-trained word embedding model and induces a fully-fledged word sense inventory, which can be used for disambiguation in context. We use this method to induce a collection of sense inventories for 158 languages on the basis of the original pre-trained fastText word embeddings by Grave et al. (2018), enabling WSD in these languages. Models and system are available online.

* 10 pages, 5 figures, 4 tables, accepted at LREC 2020

Via

Access Paper or Ask Questions

An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages

Apr 27, 2018

Dmitry Ustalov, Denis Teslenko, Alexander Panchenko, Mikhail Chernoskutov, Chris Biemann, Simone Paolo Ponzetto

Figure 1 for An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages

Figure 2 for An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages

Figure 3 for An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages

Figure 4 for An Unsupervised Word Sense Disambiguation System for Under-Resourced Languages

Abstract:In this paper, we present Watasense, an unsupervised system for word sense disambiguation. Given a sentence, the system chooses the most relevant sense of each input word with respect to the semantic similarity between the given sentence and the synset constituting the sense of the target word. Watasense has two modes of operation. The sparse mode uses the traditional vector space model to estimate the most similar word sense corresponding to its context. The dense mode, instead, uses synset embeddings to cope with the sparsity problem. We describe the architecture of the present system and also conduct its evaluation on three different lexical semantic resources for Russian. We found that the dense mode substantially outperforms the sparse one on all datasets according to the adjusted Rand index.

* In Proceedings of the 11th Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan

Via

Access Paper or Ask Questions