Computer Science – Computation and Language
Scientific paper
1996-10-29
Computer Science
Computation and Language
http://www.cs.princeton.edu/~ristad/papers/pu-532-96.ps.gz
Scientific paper
In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string edit distance. Our stochastic model allows us to learn a string edit distance function from a corpus of examples. We illustrate the utility of our approach by applying it to the difficult problem of learning the pronunciation of words in conversational speech. In this application, we learn a string edit distance with one fourth the error rate of the untrained Levenshtein distance. Our approach is applicable to any string classification problem that may be solved using a similarity function against a database of labeled prototypes. Keywords: string edit distance, Levenshtein distance, stochastic transduction, syntactic pattern recognition, prototype dictionary, spelling correction, string correction, string similarity, string classification, speech recognition, pronunciation modeling, Switchboard corpus.
Ristad Eric Sven
Yianilos Peter N.
No associations
LandOfFree
Learning string edit distance does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Learning string edit distance, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Learning string edit distance will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-575371