Computing q-gram Non-overlapping Frequencies on SLP Compressed Texts

Computer Science – Data Structures and Algorithms

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Scientific paper

Length-$q$ substrings, or $q$-grams, can represent important characteristics of text data, and determining the frequencies of all $q$-grams contained in the data is an important problem with many applications in the field of data mining and machine learning. In this paper, we consider the problem of calculating the {\em non-overlapping frequencies} of all $q$-grams in a text given in compressed form, namely, as a straight line program (SLP). We show that the problem can be solved in $O(q^2n)$ time and $O(qn)$ space where $n$ is the size of the SLP. This generalizes and greatly improves previous work (Inenaga & Bannai, 2009) which solved the problem only for $q=2$ in $O(n^4\log n)$ time and $O(n^3)$ space.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Computing q-gram Non-overlapping Frequencies on SLP Compressed Texts does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Computing q-gram Non-overlapping Frequencies on SLP Compressed Texts, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Computing q-gram Non-overlapping Frequencies on SLP Compressed Texts will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-226102

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.