An Empirical Study of Smoothing Techniques for Language Modeling

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

9 pages, LaTeX, uses aclap.sty

Scientific paper

We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Brown versus Wall Street Journal), and n-gram order (bigram versus trigram) affect the relative performance of these methods, which we measure through the cross-entropy of test data. In addition, we introduce two novel smoothing techniques, one a variation of Jelinek-Mercer smoothing and one a very simple linear interpolation technique, both of which outperform existing methods.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

An Empirical Study of Smoothing Techniques for Language Modeling does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with An Empirical Study of Smoothing Techniques for Language Modeling, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and An Empirical Study of Smoothing Techniques for Language Modeling will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-100781

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.