Communications Research Laboratory, Japan
Abstract:The paper proposes a computationally feasible method for measuring context-sensitive semantic distance between words. The distance is computed by adaptive scaling of a semantic space. In the semantic space, each word in the vocabulary V is represented by a multi-dimensional vector which is obtained from an English dictionary through a principal component analysis. Given a word set C which specifies a context for measuring word distance, each dimension of the semantic space is scaled up or down according to the distribution of C in the semantic space. In the space thus transformed, distance between words in V becomes dependent on the context C. An evaluation through a word prediction task shows that the proposed measurement successfully extracts the context of a text.
Abstract:This paper proposes a new indicator of text structure, called the lexical cohesion profile (LCP), which locates segment boundaries in a text. A text segment is a coherent scene; the words in a segment are linked together via lexical cohesion relations. LCP records mutual similarity of words in a sequence of text. The similarity of words, which represents their cohesiveness, is computed using a semantic network. Comparison with the text segments marked by a number of subjects shows that LCP closely correlates with the human judgments. LCP may provide valuable information for resolving anaphora and ellipsis.
Abstract:This paper proposes a method for measuring semantic similarity between words as a new tool for text analysis. The similarity is measured on a semantic network constructed systematically from a subset of the English dictionary, LDOCE (Longman Dictionary of Contemporary English). Spreading activation on the network can directly compute the similarity between any two words in the Longman Defining Vocabulary, and indirectly the similarity of all the other words in LDOCE. The similarity represents the strength of lexical cohesion or semantic relation, and also provides valuable information about similarity and coherence of texts.