Locating regions in a sequence under density constraints

Computer Science – Data Structures and Algorithms

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

15 pages, 8 figures

Scientific paper

Several biological problems require the identification of regions in a sequence where some feature occurs within a target density range: examples including the location of GC-rich regions, identification of CpG islands, and sequence matching. Mathematically, this corresponds to searching a string of 0s and 1s for a substring whose relative proportion of 1s lies between given lower and upper bounds. We consider the algorithmic problem of locating the longest such substring, as well as other related problems (such as finding the shortest substring or a maximal set of disjoint substrings). For locating the longest such substring, we develop an algorithm that runs in O(n) time, improving upon the previous best-known O(n log n) result. For the related problems we likewise improve time complexities and remove significant overhead from the prior state of the art. Practical testing verifies that our new algorithms enjoy significantly smaller time and memory footprints, and can process sequences that are orders of magnitude longer as a result.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Locating regions in a sequence under density constraints does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Locating regions in a sequence under density constraints, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Locating regions in a sequence under density constraints will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-321907

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.