Computer Science – Computation and Language
Scientific paper
1996-10-25
27-Aug-96 PRICAI-96 Workshop on Future Issues for Multi-lingual Text Processing, Cairns, Australia. ISBN 0 86857 730 8
Computer Science
Computation and Language
10 pages, Microsoft Word 6.0, ps
Scientific paper
Several methods have been proposed for processing a corpus to induce a tagset for the sub-language represented by the corpus. This paper examines a structured-tag word classification method introduced by McMahon (1994) and discussed further by McMahon & Smith (1995) in cmp-lg/9503011 . Two major variations, (1) non-random initial assignment of words to classes and (2) moving multiple words in parallel, together provide robust non-random results with a speed increase of 200% to 450%, at the cost of slightly lower quality than McMahon's method's average quality. Two further variations, (3) retaining information from less- frequent words and (4) avoiding reclustering closed classes, are proposed for further study. Note: The speed increases quoted above are relative to my implementation of my understanding of McMahon's algorithm; this takes time measured in hours and days on a home PC. A revised version of the McMahon & Smith (1995) paper has appeared (June 1996) in Computational Linguistics 22(2):217- 247; this refers to a time of "several weeks" to cluster 569 words on a Sparc-IPC.
No associations
LandOfFree
A Faster Structured-Tag Word-Classification Method does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with A Faster Structured-Tag Word-Classification Method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and A Faster Structured-Tag Word-Classification Method will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-209518