Generalizing Case Frames Using a Thesaurus and the MDL Principle

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

11 pages, uuencoded compressed postscript, a revised version

Scientific paper

We address the problem of automatically acquiring case-frame patterns from large corpus data. In particular, we view this problem as the problem of estimating a (conditional) distribution over a partition of words, and propose a new generalization method based on the MDL (Minimum Description Length) principle. In order to assist with the efficiency, our method makes use of an existing thesaurus and restricts its attention on those partitions that are present as `cuts' in the thesaurus tree, thus reducing the generalization problem to that of estimating the `tree cut models' of the thesaurus. We then give an efficient algorithm which provably obtains the optimal tree cut model for the given frequency data, in the sense of MDL. We have used the case-frame patterns obtained using our method to resolve pp-attachment ambiguity.Our experimental results indicate that our method improves upon or is at least as effective as existing methods.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Generalizing Case Frames Using a Thesaurus and the MDL Principle does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Generalizing Case Frames Using a Thesaurus and the MDL Principle, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Generalizing Case Frames Using a Thesaurus and the MDL Principle will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-102966

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.