Automatic Evaluation and Uniform Filter Cascades for Inducing N-Best Translation Lexicons

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

To appear in Proceedings of the Third Workshop on Very Large Corpora, 15 pages, uuencoded compressed PostScript

Scientific paper

This paper shows how to induce an N-best translation lexicon from a bilingual text corpus using statistical properties of the corpus together with four external knowledge sources. The knowledge sources are cast as filters, so that any subset of them can be cascaded in a uniform framework. A new objective evaluation measure is used to compare the quality of lexicons induced with different filter cascades. The best filter cascades improve lexicon quality by up to 137% over the plain vanilla statistical method, and approach human performance. Drastically reducing the size of the training corpus has a much smaller impact on lexicon quality when these knowledge sources are used. This makes it practical to train on small hand-built corpora for language pairs where large bilingual corpora are unavailable. Moreover, three of the four filters prove useful even when used with large training corpora.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Automatic Evaluation and Uniform Filter Cascades for Inducing N-Best Translation Lexicons does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Automatic Evaluation and Uniform Filter Cascades for Inducing N-Best Translation Lexicons, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Automatic Evaluation and Uniform Filter Cascades for Inducing N-Best Translation Lexicons will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-95187

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.