ITRI, University of Brighton
Abstract:We show that a general model of lexical information conforms to an abstract model that reflects the hierarchy of information found in a typical dictionary entry. We show that this model can be mapped into a well-formed XML document, and how the XSL transformation language can be used to implement a semantics defined over the abstract model to enable extraction and manipulation of the information in any format.
Abstract:Word sense disambiguation has developed as a sub-area of natural language processing, as if, like parsing, it was a well-defined task which was a pre-requisite to a wide range of language-understanding applications. First, I review earlier work which shows that a set of senses for a word is only ever defined relative to a particular human purpose, and that a view of word senses as part of the linguistic furniture lacks theoretical underpinnings. Then, I investigate whether and how word sense ambiguity is in fact a problem for different varieties of NLP application.
Abstract:Lexicon acquisition from machine-readable dictionaries and corpora is currently a dynamic field of research, yet it is often not clear how lexical information so acquired can be used, or how it relates to structured meaning representations. In this paper I look at this issue in relation to Information Extraction (hereafter IE), and one subtask for which both lexical and general knowledge are required, Word Sense Disambiguation (WSD). The analysis is based on the widely-used, but little-discussed distinction between an IE system's foreground lexicon, containing the domain's key terms which map onto the database fields of the output formalism, and the background lexicon, containing the remainder of the vocabulary. For the foreground lexicon, human lexicography is required. For the background lexicon, automatic acquisition is appropriate. For the foreground lexicon, WSD will occur as a by-product of finding a coherent semantic interpretation of the input. WSD techniques as discussed in recent literature are suited only to the background lexicon. Once the foreground/background distinction is developed, there is a match between what is possible, given the state of the art in WSD, and what is required, for high-quality IE.
Abstract:Word sense disambiguation assumes word senses. Within the lexicography and linguistics literature, they are known to be very slippery entities. The paper looks at problems with existing accounts of `word sense' and describes the various kinds of ways in which a word's meaning can deviate from its core meaning. An analysis is presented in which word senses are abstractions from clusters of corpus citations, in accordance with current lexicographic practice. The corpus citations, not the word senses, are the basic objects in the ontology. The corpus citations will be clustered into senses according to the purposes of whoever or whatever does the clustering. In the absence of such purposes, word senses do not exist. Word sense disambiguation also needs a set of word senses to disambiguate between. In most recent work, the set has been taken from a general-purpose lexical resource, with the assumption that the lexical resource describes the word senses of English/French/..., between which NLP applications will need to disambiguate. The implication of the paper is, by contrast, that word senses exist only relative to a task.