Fast Statistical Parsing of Noun Phrases for Document Indexing

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

8 pages, LaTex; uses aclap.sty. To appear in Proceedings of the 5th Conference on Applied Natural Language Processing, Washing

Scientific paper

Information Retrieval (IR) is an important application area of Natural Language Processing (NLP) where one encounters the genuine challenge of processing large quantities of unrestricted natural language text. While much effort has been made to apply NLP techniques to IR, very few NLP techniques have been evaluated on a document collection larger than several megabytes. Many NLP techniques are simply not efficient enough, and not robust enough, to handle a large amount of text. This paper proposes a new probabilistic model for noun phrase parsing, and reports on the application of such a parsing technique to enhance document indexing. The effectiveness of using syntactic phrases provided by the parser to supplement single words for indexing is evaluated with a 250 megabytes document collection. The experiment's results show that supplementing single words with syntactic phrases for indexing consistently and significantly improves retrieval performance.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Fast Statistical Parsing of Noun Phrases for Document Indexing does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Fast Statistical Parsing of Noun Phrases for Document Indexing, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Fast Statistical Parsing of Noun Phrases for Document Indexing will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-297856

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.