Linguistic complexity: English vs. Polish, text vs. corpus

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Scientific paper

We analyze the rank-frequency distributions of words in selected English and Polish texts. We show that for the lemmatized (basic) word forms the scale-invariant regime breaks after about two decades, while it might be consistent for the whole range of ranks for the inflected word forms. We also find that for a corpus consisting of texts written by different authors the basic scale-invariant regime is broken more strongly than in the case of comparable corpus consisting of texts written by the same author. Similarly, for a corpus consisting of texts translated into Polish from other languages the scale-invariant regime is broken more strongly than for a comparable corpus of native Polish texts. Moreover, we find that if the words are tagged with their proper part of speech, only verbs show rank-frequency distribution that is almost scale-invariant.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Linguistic complexity: English vs. Polish, text vs. corpus does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Linguistic complexity: English vs. Polish, text vs. corpus, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Linguistic complexity: English vs. Polish, text vs. corpus will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-216997

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.