Size dependent word frequencies and translational invariance of books

Computer Science – Computation and Language

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

10 pages, 2 appendices (6 pages), 5 figures

Scientific paper

10.1016/j.physa.2009.09.022

It is shown that a real novel shares many characteristic features with a null model in which the words are randomly distributed throughout the text. Such a common feature is a certain translational invariance of the text. Another is that the functional form of the word-frequency distribution of a novel depends on the length of the text in the same way as the null model. This means that an approximate power-law tail ascribed to the data will have an exponent which changes with the size of the text-section which is analyzed. A further consequence is that a novel cannot be described by text-evolution models like the Simon model. The size-transformation of a novel is found to be well described by a specific Random Book Transformation. This size transformation in addition enables a more precise determination of the functional form of the word-frequency distribution. The implications of the results are discussed.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Size dependent word frequencies and translational invariance of books does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Size dependent word frequencies and translational invariance of books, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Size dependent word frequencies and translational invariance of books will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-194097

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.