Bootstrapping Structure using Similarity

Computer Science – Learning

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

11 pages

Scientific paper

In this paper a new similarity-based learning algorithm, inspired by string edit-distance (Wagner and Fischer, 1974), is applied to the problem of bootstrapping structure from scratch. The algorithm takes a corpus of unannotated sentences as input and returns a corpus of bracketed sentences. The method works on pairs of unstructured sentences or sentences partially bracketed by the algorithm that have one or more words in common. It finds parts of sentences that are interchangeable (i.e. the parts of the sentences that are different in both sentences). These parts are taken as possible constituents of the same type. While this corresponds to the basic bootstrapping step of the algorithm, further structure may be learned from comparison with other (similar) sentences. We used this method for bootstrapping structure from the flat sentences of the Penn Treebank ATIS corpus, and compared the resulting structured sentences to the structured sentences in the ATIS corpus. Similarly, the algorithm was tested on the OVIS corpus. We obtained 86.04 % non-crossing brackets precision on the ATIS corpus and 89.39 % non-crossing brackets precision on the OVIS corpus.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Bootstrapping Structure using Similarity does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Bootstrapping Structure using Similarity, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Bootstrapping Structure using Similarity will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-173336

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.