Computer Science – Computation and Language
Scientific paper
1995-04-24
Computer Science
Computation and Language
Submitted to Computational Linguistics
Scientific paper
Given a previously unseen form that is morphologically n-ways ambiguous, what is the best estimator for the lexical prior probabilities for the various functions of the form? We argue that the best estimator is provided by computing the relative frequencies of the various functions among the hapax legomena --- the forms that occur exactly once in a corpus. This result has important implications for the development of stochastic morphological taggers, especially when some initial hand-tagging of a corpus is required: For predicting lexical priors for very low-frequency morphologically ambiguous types (most of which would not occur in any given corpus) one should concentrate on tagging a good representative sample of the hapax legomena, rather than extensively tagging words of all frequency ranges.
Baayen Harald
Sproat Richard
No associations
LandOfFree
Estimating Lexical Priors for Low-Frequency Syncretic Forms does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Estimating Lexical Priors for Low-Frequency Syncretic Forms, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Estimating Lexical Priors for Low-Frequency Syncretic Forms will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-612059