The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space

Computer Science – Data Structures and Algorithms

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Scientific paper

An indexed sequence of strings is a data structure for storing a string sequence that supports random access, searching, range counting and analytics operations, both for exact matches and prefix search. String sequences lie at the core of column-oriented databases, log processing, and other storage and query tasks. In these applications each string can appear several times and the order of the strings in the sequence is relevant. The prefix structure of the strings is relevant as well: common prefixes are sought in strings to extract interesting features from the sequence. Moreover, space-efficiency is highly desirable as it translates directly into higher performance, since more data can fit in fast memory. We introduce and study the problem of compressed indexed sequence of strings, representing indexed sequences of strings in nearly-optimal compressed space, both in the static and dynamic settings, while preserving provably good performance for the supported operations. We present a new data structure for this problem, the Wavelet Trie, which combines the classical Patricia Trie with the Wavelet Tree, a succinct data structure for storing a compressed sequence. The resulting Wavelet Trie smoothly adapts to a sequence of strings that changes over time. It improves on the state-of-the-art compressed data structures by supporting a dynamic alphabet (i.e. the set of distinct strings) and prefix queries, both crucial requirements in the aforementioned applications, and on traditional indexes by reducing space occupancy to close to the entropy of the sequence.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-7232

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.