University of Sussex
Abstract:We describe a recently developed corpus annotation scheme for evaluating parsers that avoids shortcomings of current methods. The scheme encodes grammatical relations between heads and dependents, and has been used to mark up a new public-domain corpus of naturally occurring English text. We show how the corpus can be used to evaluate the accuracy of a robust parser, and relate the corpus to extant resources.
Abstract:We propose a parser for constraint-logic grammars implementing HPSG that combines the advantages of dynamic bottom-up and advanced top-down control. The parser allows the user to apply magic compilation to specific constraints in a grammar which as a result can be processed dynamically in a bottom-up and goal-directed fashion. State of the art top-down processing techniques are used to deal with the remaining constraints. We discuss various aspects concerning the implementation of the parser as part of a grammar development system.
Abstract:Research into the automatic acquisition of lexical information from corpora is starting to produce large-scale computational lexicons containing data on the relative frequencies of subcategorisation alternatives for individual verbal predicates. However, the empirical question of whether this type of frequency information can in practice improve the accuracy of a statistical parser has not yet been answered. In this paper we describe an experiment with a wide-coverage statistical grammar and parser for English and subcategorisation frequencies acquired from ten million words of text which shows that this information can significantly improve parse accuracy.
Abstract:Off-line compilation of logic grammars using Magic allows an incorporation of filtering into the logic underlying the grammar. The explicit definite clause characterization of filtering resulting from Magic compilation allows processor independent and logically clean optimizations of dynamic bottom-up processing with respect to goal-directedness. Two filter optimizations based on the program transformation technique of Unfolding are discussed which are of practical and theoretical interest.
Abstract:We investigate the use of a technique developed in the constraint programming community called constraint propagation to automatically make a HPSG theory more specific at those places where linguistically motivated underspecification would lead to inefficient processing. We discuss two concrete HPSG examples showing how off-line constraint propagation helps improve processing efficiency.
Abstract:We describe a compiler which translates a set of HPSG lexical rules and their interaction into definite relations used to constrain lexical entries. The compiler ensures automatic transfer of properties unchanged by a lexical rule. Thus an operational semantics for the full lexical rule mechanism as used in HPSG linguistics is provided. Program transformation techniques are used to advance the resulting encoding. The final output constitutes a computational counterpart of the linguistic generalizations captured by lexical rules and allows ``on the fly'' application.
Abstract:A novel approach to HPSG based natural language processing is described that uses an off-line compiler to automatically prime a declarative grammar for generation or parsing, and inputs the primed grammar to an advanced Earley-style processor. This way we provide an elegant solution to the problems with empty heads and efficient bidirectional processing which is illustrated for the special case of HPSG generation. Extensive testing with a large HPSG grammar revealed some important constraints on the form of the grammar.