Efficient Fully-Compressed Sequence Representations

Computer Science – Data Structures and Algorithms

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Scientific paper

We present a data structure that stores a sequence $s[1..n]$ over alphabet $[1..\sigma]$ in $n\Ho(s) + o(n)(\Ho(s){+}1)$ bits, where $\Ho(s)$ is the zero-order entropy of $s$. This structure supports the queries \access, \rank\ and \select, which are fundamental building blocks for many other compressed data structures, in worst-case time $\Oh{\lg\lg\sigma}$ and average time $\Oh{\lg \Ho(s)}$. The worst-case complexity matches the best previous results, yet these had been achieved with data structures using $n\Ho(s)+o(n\lg\sigma)$ bits. On highly compressible sequences the $o(n\lg\sigma)$ bits of the redundancy may be significant compared to the the $n\Ho(s)$ bits that encode the data. Our representation, instead, compresses the redundancy as well. Moreover, our average-case complexity is unprecedented. Our technique is based on partitioning the alphabet into characters of similar frequency. The subsequence corresponding to each group can then be encoded using fast uncompressed representations without harming the overall compression ratios, even in the redundancy. The result also improves upon the best current compressed representations of several other data structures. For example, we achieve $(i)$ compressed redundancy, retaining the best time complexities, for the smallest existing full-text self-indexes; $(ii)$ compressed permutations $\pi$ with times for $\pi()$ and $\pii()$ improved to loglogarithmic; and $(iii)$ the first compressed representation of dynamic collections of disjoint sets. We also point out various applications to inverted indexes, suffix arrays, binary relations, and data compressors. ...

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Efficient Fully-Compressed Sequence Representations does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Efficient Fully-Compressed Sequence Representations, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Efficient Fully-Compressed Sequence Representations will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-116106

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.