Using eigenvectors of the bigram graph to infer morpheme identity

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Scientific paper

This paper describes the results of some experiments exploring statistical methods to infer syntactic behavior of words and morphemes from a raw corpus in an unsupervised fashion. It shares certain points in common with Brown et al (1992) and work that has grown out of that: it employs statistical techniques to analyze syntactic behavior based on what words occur adjacent to a given word. However, we use an eigenvector decomposition of a nearest-neighbor graph to produce a two-dimensional rendering of the words of a corpus in which words of the same syntactic category tend to form neighborhoods. We exploit this technique for extending the value of automatic learning of morphology. In particular, we look at the suffixes derived from a corpus by unsupervised learning of morphology, and we ask which of these suffixes have a consistent syntactic function (e.g., in English, -tion is primarily a mark of nouns, but -s marks both noun plurals and 3rd person present on verbs), and we determine that this method works well for this task.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Using eigenvectors of the bigram graph to infer morpheme identity does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Using eigenvectors of the bigram graph to infer morpheme identity, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Using eigenvectors of the bigram graph to infer morpheme identity will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-632533

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.