Speeding-up $q$-gram mining on grammar-based compressed texts

Computer Science – Data Structures and Algorithms

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Scientific paper

We present an efficient algorithm for calculating $q$-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP $\mathcal{T}$ of size $n$ that represents string $T$, the algorithm computes the occurrence frequencies of all $q$-grams in $T$, by reducing the problem to the weighted $q$-gram frequencies problem on a trie-like structure of size $m = |T|-\mathit{dup}(q,\mathcal{T})$, where $\mathit{dup}(q,\mathcal{T})$ is a quantity that represents the amount of redundancy that the SLP captures with respect to $q$-grams. The reduced problem can be solved in linear time. Since $m = O(qn)$, the running time of our algorithm is $O(\min\{|T|-\mathit{dup}(q,\mathcal{T}),qn\})$, improving our previous $O(qn)$ algorithm when $q = \Omega(|T|/n)$.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Speeding-up $q$-gram mining on grammar-based compressed texts does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Speeding-up $q$-gram mining on grammar-based compressed texts, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speeding-up $q$-gram mining on grammar-based compressed texts will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-557864

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.