Word-to-Word Models of Translational Equivalence

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Scientific paper

Parallel texts (bitexts) have properties that distinguish them from other kinds of parallel data. First, most words translate to only one other word. Second, bitext correspondence is noisy. This article presents methods for biasing statistical translation models to reflect these properties. Analysis of the expected behavior of these biases in the presence of sparse data predicts that they will result in more accurate models. The prediction is confirmed by evaluation with respect to a gold standard -- translation models that are biased in this fashion are significantly more accurate than a baseline knowledge-poor model. This article also shows how a statistical translation model can take advantage of various kinds of pre-existing knowledge that might be available about particular language pairs. Even the simplest kinds of language-specific knowledge, such as the distinction between content words and function words, is shown to reliably boost translation model performance on some tasks. Statistical models that are informed by pre-existing knowledge about the model domain combine the best of both the rationalist and empiricist traditions.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Word-to-Word Models of Translational Equivalence does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Word-to-Word Models of Translational Equivalence, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Word-to-Word Models of Translational Equivalence will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-719117

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.