Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

7 pages; 2 eps figures; uses epsf, colacl

Scientific paper

Finding simple, non-recursive, base noun phrases is an important subtask for many natural language processing applications. While previous empirical methods for base NP identification have been rather complex, this paper instead proposes a very simple algorithm that is tailored to the relative simplicity of the task. In particular, we present a corpus-based approach for finding base NPs by matching part-of-speech tag sequences. The training phase of the algorithm is based on two successful techniques: first the base NP grammar is read from a ``treebank'' corpus; then the grammar is improved by selecting rules with high ``benefit'' scores. Using this simple algorithm with a naive heuristic for matching rules, we achieve surprising accuracy in an evaluation on the Penn Treebank Wall Street Journal.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-490282

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.