Computer Science – Information Theory
Scientific paper
2005-04-13
Computer Science
Information Theory
submitted to ITW2005
Scientific paper
Bounds on the entropy of patterns of sequences generated by independently identically distributed (i.i.d.) sources are derived. A pattern is a sequence of indices that contains all consecutive integer indices in increasing order of first occurrence. If the alphabet of a source that generated a sequence is unknown, the inevitable cost of coding the unknown alphabet symbols can be exploited to create the pattern of the sequence. This pattern can in turn be compressed by itself. The bounds derived here are functions of the i.i.d. source entropy, alphabet size, and letter probabilities. It is shown that for large alphabets, the pattern entropy must decrease from the i.i.d. one. The decrease is in many cases more significant than the universal coding redundancy bounds derived in prior works. The pattern entropy is confined between two bounds that depend on the arrangement of the letter probabilities in the probability space. For very large alphabets whose size may be greater than the coded pattern length, all low probability letters are packed into one symbol. The pattern entropy is upper and lower bounded in terms of the i.i.d. entropy of the new packed alphabet. Correction terms, which are usually negligible, are provided for both upper and lower bounds.
No associations
LandOfFree
Bounds on the Entropy of Patterns of I.I.D. Sequences does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Bounds on the Entropy of Patterns of I.I.D. Sequences, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Bounds on the Entropy of Patterns of I.I.D. Sequences will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-249352