Letter to Sound Rules for Accented Lexicon Compression

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

4 pages 1 figure

Scientific paper

This paper presents trainable methods for generating letter to sound rules from a given lexicon for use in pronouncing out-of-vocabulary words and as a method for lexicon compression. As the relationship between a string of letters and a string of phonemes representing its pronunciation for many languages is not trivial, we discuss two alignment procedures, one fully automatic and one hand-seeded which produce reasonable alignments of letters to phones. Top Down Induction Tree models are trained on the aligned entries. We show how combined phoneme/stress prediction is better than separate prediction processes, and still better when including in the model the last phonemes transcribed and part of speech information. For the lexicons we have tested, our models have a word accuracy (including stress) of 78% for OALD, 62% for CMU and 94% for BRULEX. The extremely high scores on the training sets allow substantial size reductions (more than 1/20). WWW site: http://tcts.fpms.ac.be/synthesis/mbrdico

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Letter to Sound Rules for Accented Lexicon Compression does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Letter to Sound Rules for Accented Lexicon Compression, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Letter to Sound Rules for Accented Lexicon Compression will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-21279

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.