Computer Science – Computation and Language
Scientific paper
2010-07-06
Acta Phys. Pol. A 117, 716-720 (2010)
Computer Science
Computation and Language
Scientific paper
We analyze the rank-frequency distributions of words in selected English and Polish texts. We show that for the lemmatized (basic) word forms the scale-invariant regime breaks after about two decades, while it might be consistent for the whole range of ranks for the inflected word forms. We also find that for a corpus consisting of texts written by different authors the basic scale-invariant regime is broken more strongly than in the case of comparable corpus consisting of texts written by the same author. Similarly, for a corpus consisting of texts translated into Polish from other languages the scale-invariant regime is broken more strongly than for a comparable corpus of native Polish texts. Moreover, we find that if the words are tagged with their proper part of speech, only verbs show rank-frequency distribution that is almost scale-invariant.
Drozdz Stanislaw
Kwapien Jaroslaw
Orczyk Adam
No associations
LandOfFree
Linguistic complexity: English vs. Polish, text vs. corpus does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Linguistic complexity: English vs. Polish, text vs. corpus, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Linguistic complexity: English vs. Polish, text vs. corpus will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-216997