Generalized probabilistic LR parsing of natural language (corpora) with unification-based grammars
We describe work toward the construction of a very wide-coverage probabilistic parsing
system for natural language (NL), based on LR parsing techniques. The system is intended
to rank the large number of syntactic analyses produced by NL grammars according to the
frequency of occurrence of the individual rules deployed in each analysis. We discuss a fully
automatic procedure for constructing an LR parse table from a unification-based grammar
formalism, and consider the suitability of alternative LALR (1) parse table construction …
system for natural language (NL), based on LR parsing techniques. The system is intended
to rank the large number of syntactic analyses produced by NL grammars according to the
frequency of occurrence of the individual rules deployed in each analysis. We discuss a fully
automatic procedure for constructing an LR parse table from a unification-based grammar
formalism, and consider the suitability of alternative LALR (1) parse table construction …
We describe work toward the construction of a very wide-coverage probabilistic parsing system for natural language (NL), based on LR parsing techniques. The system is intended to rank the large number of syntactic analyses produced by NL grammars according to the frequency of occurrence of the individual rules deployed in each analysis. We discuss a fully automatic procedure for constructing an LR parse table from a unification-based grammar formalism, and consider the suitability of alternative LALR (1) parse table construction methods for large grammars. The parse table is used as the basis for two parsers; a user-driven interactive system that provides a computationally tractable and labor-efficient method of supervised training of the statistical information required to drive the probabilistic parser. The latter is constructed by associating probabilities with the LR parse table directly. This technique is superior to parsers based on probabilistic lexical tagging or probabilistic context-free grammar because it allows for a more context-dependent probabilistic language model, as well as use of a more linguistically adequate grammar formalism. We compare the performance of an optimized variant of Tomita's (1987) generalized LR parsing algorithm to an (efficiently indexed and optimized) chart parser. We report promising results of a pilot study training on 150 noun definitions from the Longman Dictionary of Contemporary English (LDOCE) and retesting on these plus a further 55 definitions. Finally, we discuss limitations of the current system and possible extensions to deal with lexical (syntactic and semantic) frequency of occurrence.
books.google.com
以上显示的是最相近的搜索结果。 查看全部搜索结果