Computer Science – Computation and Language
Scientific paper
1996-09-28
Computer Science
Computation and Language
15 pages, minor revisions on Sept. 30, 1996
Scientific paper
The first step in most corpus-based multilingual NLP work is to construct a detailed map of the correspondence between a text and its translation. Several automatic methods for this task have been proposed in recent years. Yet even the best of these methods can err by several typeset pages. The Smooth Injective Map Recognizer (SIMR) is a new bitext mapping algorithm. SIMR's errors are smaller than those of the previous front-runner by more than a factor of 4. Its robustness has enabled new commercial-quality applications. The greedy nature of the algorithm makes it independent of memory resources. Unlike other bitext mapping algorithms, SIMR allows crossing correspondences to account for word order differences. Its output can be converted quickly and easily into a sentence alignment. SIMR's output has been used to align over 200 megabytes of the Canadian Hansards for publication by the Linguistic Data Consortium.
No associations
LandOfFree
A Geometric Approach to Mapping Bitext Correspondence does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with A Geometric Approach to Mapping Bitext Correspondence, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and A Geometric Approach to Mapping Bitext Correspondence will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-272322