Computer Science – Computation and Language
Scientific paper
1996-06-11
Computer Science
Computation and Language
uuencoded postscript file. email: cmp-lg/9606012
Scientific paper
We report our development of a simple but fast and efficient inductive unsupervised semantic tagger for Chinese words. A POS hand-tagged corpus of 348,000 words is used. The corpus is being tagged in two steps. First, possible semantic tags are selected from a semantic dictionary(Tong Yi Ci Ci Lin), the POS and the conditional probability of semantic from POS, i.e., P(S|P). The final semantic tag is then assigned by considering the semantic tags before and after the current word and the semantic-word conditional probability P(S|W) derived from the first step. Semantic bigram probabilities P(S|S) are used in the second step. Final manual checking shows that this simple but efficient algorithm has a hit rate of 91%. The tagger tags 142 words per second, using a 120 MHz Pentium running FOXPRO. It runs about 2.3 times faster than a Viterbi tagger.
No associations
LandOfFree
An Efficient Inductive Unsupervised Semantic Tagger does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with An Efficient Inductive Unsupervised Semantic Tagger, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and An Efficient Inductive Unsupervised Semantic Tagger will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-100786