Self-Index Based on LZ77

Computer Science – Data Structures and Algorithms

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Scientific paper

We introduce the first self-index based on the Lempel-Ziv 1977 compression format (LZ77). It is particularly competitive for highly repetitive text collections such as sequence databases of genomes of related species, software repositories, versioned document collections, and temporal text databases. Such collections are extremely compressible but classical self-indexes fail to capture that source of compressibility. Our self-index takes in practice a few times the space of the text compressed with LZ77 (as little as 2.6 times), extracts 1--2 million characters of the text per second, and finds patterns at a rate of 10--50 microseconds per occurrence. It is smaller (up to one half) than the best current self-index for repetitive collections, and faster in many cases.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Self-Index Based on LZ77 does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Self-Index Based on LZ77, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Self-Index Based on LZ77 will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-18662

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.