Learning string edit distance

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

http://www.cs.princeton.edu/~ristad/papers/pu-532-96.ps.gz

Scientific paper

In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string edit distance. Our stochastic model allows us to learn a string edit distance function from a corpus of examples. We illustrate the utility of our approach by applying it to the difficult problem of learning the pronunciation of words in conversational speech. In this application, we learn a string edit distance with one fourth the error rate of the untrained Levenshtein distance. Our approach is applicable to any string classification problem that may be solved using a similarity function against a database of labeled prototypes. Keywords: string edit distance, Levenshtein distance, stochastic transduction, syntactic pattern recognition, prototype dictionary, spelling correction, string correction, string similarity, string classification, speech recognition, pronunciation modeling, Switchboard corpus.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Learning string edit distance does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Learning string edit distance, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Learning string edit distance will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-575371

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.