A Faster Structured-Tag Word-Classification Method

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

10 pages, Microsoft Word 6.0, ps

Scientific paper

Several methods have been proposed for processing a corpus to induce a tagset for the sub-language represented by the corpus. This paper examines a structured-tag word classification method introduced by McMahon (1994) and discussed further by McMahon & Smith (1995) in cmp-lg/9503011 . Two major variations, (1) non-random initial assignment of words to classes and (2) moving multiple words in parallel, together provide robust non-random results with a speed increase of 200% to 450%, at the cost of slightly lower quality than McMahon's method's average quality. Two further variations, (3) retaining information from less- frequent words and (4) avoiding reclustering closed classes, are proposed for further study. Note: The speed increases quoted above are relative to my implementation of my understanding of McMahon's algorithm; this takes time measured in hours and days on a home PC. A revised version of the McMahon & Smith (1995) paper has appeared (June 1996) in Computational Linguistics 22(2):217- 247; this refers to a time of "several weeks" to cluster 569 words on a Sparc-IPC.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

A Faster Structured-Tag Word-Classification Method does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with A Faster Structured-Tag Word-Classification Method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and A Faster Structured-Tag Word-Classification Method will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-209518

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.