Cross-lingual keyword assignment

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Precursor paper to cs.CL/0609059. The automatic classification system described here has now matured and is in daily use for d

Scientific paper

This paper presents a language-independent approach to controlled vocabulary keyword assignment using the EUROVOC thesaurus. Due to the multilingual nature of EUROVOC, the keywords for a document written in one language can be displayed in all eleven official European Union languages. The mapping of documents written in different languages to the same multilingual thesaurus furthermore allows cross-language document comparison. The assignment of the controlled vocabulary thesaurus descriptors is achieved by applying a statistical method that uses a collection of manually indexed documents to identify, for each thesaurus descriptor, a large number of lemmas that are statistically associated to the descriptor. These associated words are then used during the assignment procedure to identify a ranked list of those EUROVOC terms that are most likely to be good keywords for a given document. The paper also describes the challenges of this task and discusses the achieved results of the fully functional prototype.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Cross-lingual keyword assignment does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Cross-lingual keyword assignment, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Cross-lingual keyword assignment will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-691127

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.