Computer Science – Computation and Language
Scientific paper
1998-08-26
Proceedings of COLING-ACL'98, pages 218-224.
Computer Science
Computation and Language
7 pages; 2 eps figures; uses epsf, colacl
Scientific paper
Finding simple, non-recursive, base noun phrases is an important subtask for many natural language processing applications. While previous empirical methods for base NP identification have been rather complex, this paper instead proposes a very simple algorithm that is tailored to the relative simplicity of the task. In particular, we present a corpus-based approach for finding base NPs by matching part-of-speech tag sequences. The training phase of the algorithm is based on two successful techniques: first the base NP grammar is read from a ``treebank'' corpus; then the grammar is improved by selecting rules with high ``benefit'' scores. Using this simple algorithm with a naive heuristic for matching rules, we achieve surprising accuracy in an evaluation on the Penn Treebank Wall Street Journal.
Cardie Claire
Pierce David
No associations
LandOfFree
Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Error-Driven Pruning of Treebank Grammars for Base Noun Phrase Identification will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-490282