An Algorithm for Aligning Sentences in Bilingual Corpora Using Lexical Information

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

10 pages, 5 figures, Conference : International Conference on Natural Language Processing ' 2002, Mumbai

Scientific paper

In this paper we describe an algorithm for aligning sentences with their translations in a bilingual corpus using lexical information of the languages. Existing efficient algorithms ignore word identities and consider only the sentence lengths (Brown, 1991; Gale and Church, 1993). For a sentence in the source language text, the proposed algorithm picks the most likely translation from the target language text using lexical information and certain heuristics. It does not do statistical analysis using sentence lengths. The algorithm is language independent. It also aids in detecting addition and deletion of text in translations. The algorithm gives comparable results with the existing algorithms in most of the cases while it does better in cases where statistical algorithms do not give good results.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

An Algorithm for Aligning Sentences in Bilingual Corpora Using Lexical Information does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with An Algorithm for Aligning Sentences in Bilingual Corpora Using Lexical Information, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and An Algorithm for Aligning Sentences in Bilingual Corpora Using Lexical Information will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-439144

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.