Computer Science – Computation and Language
Scientific paper
1996-04-22
Proceedings of the Conference on Empirical Methods in Natural Language Processing, May 1996
Computer Science
Computation and Language
10 pages
Scientific paper
Excellent results have been reported for Data-Oriented Parsing (DOP) of natural language texts (Bod, 1993). Unfortunately, existing algorithms are both computationally intensive and difficult to implement. Previous algorithms are expensive due to two factors: the exponential number of rules that must be generated and the use of a Monte Carlo parsing algorithm. In this paper we solve the first problem by a novel reduction of the DOP model to a small, equivalent probabilistic context-free grammar. We solve the second problem by a novel deterministic parsing strategy that maximizes the expected number of correct constituents, rather than the probability of a correct parse tree. Using the optimizations, experiments yield a 97% crossing brackets rate and 88% zero crossing brackets rate. This differs significantly from the results reported by Bod, and is comparable to results from a duplication of Pereira and Schabes's (1992) experiment on the same data. We show that Bod's results are at least partially due to an extremely fortuitous choice of test data, and partially due to using cleaner data than other researchers.
No associations
LandOfFree
Efficient Algorithms for Parsing the DOP Model does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Efficient Algorithms for Parsing the DOP Model, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Efficient Algorithms for Parsing the DOP Model will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-393593