Cross-Language Information Retrieval for Technical Documents

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

9 pages, 5 Postscript figures, uses colacl.sty and psfig.tex

Scientific paper

This paper proposes a Japanese/English cross-language information retrieval (CLIR) system targeting technical documents. Our system first translates a given query containing technical terms into the target language, and then retrieves documents relevant to the translated query. The translation of technical terms is still problematic in that technical terms are often compound words, and thus new terms can be progressively created simply by combining existing base words. In addition, Japanese often represents loanwords based on its phonogram. Consequently, existing dictionaries find it difficult to achieve sufficient coverage. To counter the first problem, we use a compound word translation method, which uses a bilingual dictionary for base words and collocational statistics to resolve translation ambiguity. For the second problem, we propose a transliteration method, which identifies phonetic equivalents in the target language. We also show the effectiveness of our system using a test collection for CLIR.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Cross-Language Information Retrieval for Technical Documents does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Cross-Language Information Retrieval for Technical Documents, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Cross-Language Information Retrieval for Technical Documents will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-604250

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.