Efficient Algorithms for Parsing the DOP Model

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

10 pages

Scientific paper

Excellent results have been reported for Data-Oriented Parsing (DOP) of natural language texts (Bod, 1993). Unfortunately, existing algorithms are both computationally intensive and difficult to implement. Previous algorithms are expensive due to two factors: the exponential number of rules that must be generated and the use of a Monte Carlo parsing algorithm. In this paper we solve the first problem by a novel reduction of the DOP model to a small, equivalent probabilistic context-free grammar. We solve the second problem by a novel deterministic parsing strategy that maximizes the expected number of correct constituents, rather than the probability of a correct parse tree. Using the optimizations, experiments yield a 97% crossing brackets rate and 88% zero crossing brackets rate. This differs significantly from the results reported by Bod, and is comparable to results from a duplication of Pereira and Schabes's (1992) experiment on the same data. We show that Bod's results are at least partially due to an extremely fortuitous choice of test data, and partially due to using cleaner data than other researchers.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Efficient Algorithms for Parsing the DOP Model does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Efficient Algorithms for Parsing the DOP Model, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Efficient Algorithms for Parsing the DOP Model will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-393593

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.