Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Amol Agrawal

Extending Multi-Sense Word Embedding to Phrases and Sentences for Unsupervised Semantic Applications

Mar 29, 2021

Haw-Shiuan Chang, Amol Agrawal, Andrew McCallum

Figure 1 for Extending Multi-Sense Word Embedding to Phrases and Sentences for Unsupervised Semantic Applications

Figure 2 for Extending Multi-Sense Word Embedding to Phrases and Sentences for Unsupervised Semantic Applications

Figure 3 for Extending Multi-Sense Word Embedding to Phrases and Sentences for Unsupervised Semantic Applications

Figure 4 for Extending Multi-Sense Word Embedding to Phrases and Sentences for Unsupervised Semantic Applications

Abstract:Most unsupervised NLP models represent each word with a single point or single region in semantic space, while the existing multi-sense word embeddings cannot represent longer word sequences like phrases or sentences. We propose a novel embedding method for a text sequence (a phrase or a sentence) where each sequence is represented by a distinct set of multi-mode codebook embeddings to capture different semantic facets of its meaning. The codebook embeddings can be viewed as the cluster centers which summarize the distribution of possibly co-occurring words in a pre-trained word embedding space. We introduce an end-to-end trainable neural model that directly predicts the set of cluster centers from the input text sequence during test time. Our experiments show that the per-sentence codebook embeddings significantly improve the performances in unsupervised sentence similarity and extractive summarization benchmarks. In phrase similarity experiments, we discover that the multi-facet embeddings provide an interpretable semantic representation but do not outperform the single-facet baseline.

* AAAI 2021

Via

Access Paper or Ask Questions

Efficient Graph-based Word Sense Induction by Distributional Inclusion Vector Embeddings

May 29, 2018

Haw-Shiuan Chang, Amol Agrawal, Ananya Ganesh, Anirudha Desai, Vinayak Mathur, Alfred Hough, Andrew McCallum

Figure 1 for Efficient Graph-based Word Sense Induction by Distributional Inclusion Vector Embeddings

Figure 2 for Efficient Graph-based Word Sense Induction by Distributional Inclusion Vector Embeddings

Figure 3 for Efficient Graph-based Word Sense Induction by Distributional Inclusion Vector Embeddings

Figure 4 for Efficient Graph-based Word Sense Induction by Distributional Inclusion Vector Embeddings

Abstract:Word sense induction (WSI), which addresses polysemy by unsupervised discovery of multiple word senses, resolves ambiguities for downstream NLP tasks and also makes word representations more interpretable. This paper proposes an accurate and efficient graph-based method for WSI that builds a global non-negative vector embedding basis (which are interpretable like topics) and clusters the basis indexes in the ego network of each polysemous word. By adopting distributional inclusion vector embeddings as our basis formation model, we avoid the expensive step of nearest neighbor search that plagues other graph-based methods without sacrificing the quality of sense clusters. Experiments on three datasets show that our proposed method produces similar or better sense clusters and embeddings compared with previous state-of-the-art methods while being significantly more efficient.

* TextGraphs 2018: the Workshop on Graph-based Methods for Natural Language Processing

Via

Access Paper or Ask Questions