Computer Science – Data Structures and Algorithms
Scientific paper
2011-06-14
Proceedings of the VLDB Endowment (PVLDB), Vol. 5, No. 3, pp. 265-273 (2011)
Computer Science
Data Structures and Algorithms
VLDB2012
Scientific paper
Compression techniques that support fast random access are a core component of any information system. Current state-of-the-art methods group documents into fixed-sized blocks and compress each block with a general-purpose adaptive algorithm such as GZIP. Random access to a specific document then requires decompression of a block. The choice of block size is critical: it trades between compression effectiveness and document retrieval times. In this paper we present a scalable compression method for large document collections that allows fast random access. We build a representative sample of the collection and use it as a dictionary in a LZ77-like encoding of the rest of the collection, relative to the dictionary. We demonstrate on large collections, that using a dictionary as small as 0.1% of the collection size, our algorithm is dramatically faster than previous methods, and in general gives much better compression.
Hoobin Christopher
Puglisi Simon J.
Zobel Justin
No associations
LandOfFree
Relative Lempel-Ziv Factorization for Efficient Storage and Retrieval of Web Collections does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Relative Lempel-Ziv Factorization for Efficient Storage and Retrieval of Web Collections, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Relative Lempel-Ziv Factorization for Efficient Storage and Retrieval of Web Collections will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-112901