Computer Science – Information Theory
Scientific paper
2008-10-17
Computer Science
Information Theory
24 pages, no figures
Scientific paper
The article presents a new interpretation for Zipf-Mandelbrot's law in natural language which rests on two areas of information theory. Firstly, we construct a new class of grammar-based codes and, secondly, we investigate properties of strongly nonergodic stationary processes. The motivation for the joint discussion is to prove a proposition with a simple informal statement: If a text of length $n$ describes $n^\beta$ independent facts in a repetitive way then the text contains at least $n^\beta/\log n$ different words, under suitable conditions on $n$. In the formal statement, two modeling postulates are adopted. Firstly, the words are understood as nonterminal symbols of the shortest grammar-based encoding of the text. Secondly, the text is assumed to be emitted by a finite-energy strongly nonergodic source whereas the facts are binary IID variables predictable in a shift-invariant way.
No associations
LandOfFree
On the Vocabulary of Grammar-Based Codes and the Logical Consistency of Texts does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with On the Vocabulary of Grammar-Based Codes and the Logical Consistency of Texts, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and On the Vocabulary of Grammar-Based Codes and the Logical Consistency of Texts will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-352245