Abstract:Existing probabilistic scanners and parsers impose hard constraints on the way lexical and syntactic ambiguities can be resolved. Furthermore, traditional grammar-based parsing tools are limited in the mechanisms they allow for taking context into account. In this paper, we propose a model-driven tool that allows for statistical language models with arbitrary probability estimators. Our work on model-driven probabilistic parsing is built on top of ModelCC, a model-based parser generator, and enables the probabilistic interpretation and resolution of anaphoric, cataphoric, and recursive references in the disambiguation of abstract syntax graphs. In order to prove the expression power of ModelCC, we describe the design of a general-purpose natural language parser.
Abstract:Lexical ambiguities naturally arise in languages. We present Lamb, a lexical analyzer that produces a lexical analysis graph describing all the possible sequences of tokens that can be found within the input string. Parsers can process such lexical analysis graphs and discard any sequence of tokens that does not produce a valid syntactic sentence, therefore performing, together with Lamb, a context-sensitive lexical analysis in lexically-ambiguous language specifications.
Abstract:Traditional language processing tools constrain language designers to specific kinds of grammars. In contrast, model-based language specification decouples language design from language processing. As a consequence, model-based language specification tools need general parsers able to parse unrestricted context-free grammars. As languages specified following this approach may be ambiguous, parsers must deal with ambiguities. Model-based language specification also allows the definition of associativity, precedence, and custom constraints. Therefore parsers generated by model-driven language specification tools need to enforce constraints. In this paper, we propose Fence, an efficient bottom-up chart parser with lexical and syntactic ambiguity support that allows the specification of constraints and, therefore, enables the use of model-based language specification in practice.
Abstract:Model-based language specification has applications in the implementation of language processors, the design of domain-specific languages, model-driven software development, data integration, text mining, natural language processing, and corpus-based induction of models. Model-based language specification decouples language design from language processing and, unlike traditional grammar-driven approaches, which constrain language designers to specific kinds of grammars, it needs general parser generators able to deal with ambiguities. In this paper, we propose Fence, an efficient bottom-up parsing algorithm with lexical and syntactic ambiguity support that enables the use of model-based language specification in practice.