Automated Postediting of Documents

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

6 pages, Compressed and uuencoded postscript. To appear: AAAI-94

Scientific paper

Large amounts of low- to medium-quality English texts are now being produced by machine translation (MT) systems, optical character readers (OCR), and non-native speakers of English. Most of this text must be postedited by hand before it sees the light of day. Improving text quality is tedious work, but its automation has not received much research attention. Anyone who has postedited a technical report or thesis written by a non-native speaker of English knows the potential of an automated postediting system. For the case of MT-generated text, we argue for the construction of postediting modules that are portable across MT systems, as an alternative to hardcoding improvements inside any one system. As an example, we have built a complete self-contained postediting module for the task of article selection (a, an, the) for English noun phrases. This is a notoriously difficult problem for Japanese-English MT. Our system contains over 200,000 rules derived automatically from online text resources. We report on learning algorithms, accuracy, and comparisons with human performance.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Automated Postediting of Documents does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Automated Postediting of Documents, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Automated Postediting of Documents will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-282189

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.