The Distribution and Deposition Algorithm for Multiple Sequences Sets

Computer Science – Data Structures and Algorithms

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

15 pages, 7 figures, extended version of conference paper presented on GIW 2006, revised version accepted by Journal of Combin

Scientific paper

Sequences set is a mathematical model used in many applications. As the number of the sequences becomes larger, single sequence set model is not appropriate for the rapidly increasing problem sizes. For example, more and more text processing applications separate a single big text file into multiple files before processing. For these applications, the underline mathematical model is multiple sequences sets (MSS). Though there is increasing use of MSS, there is little research on how to process MSS efficiently. To process multiple sequences sets, sequences are first distributed to different sets, and then sequences for each set are processed. Deriving effective algorithm for MSS processing is both interesting and challenging. In this paper, we have defined the cost functions and performance ratio for analysis of the quality of synthesis sequences. Based on these, the problem of Process of Multiple Sequences Sets (PMSS) is formulated. We have first proposed two greedy algorithms for the PMSS problem, which are based on generalization of algorithms for single sequences set. Then based on the analysis of the characteristics of multiple sequences sets, we have proposed the Distribution and Deposition (DDA) algorithm and DDA* algorithm for PMSS problem. In DDA algorithm, the sequences are first distributed to multiple sets according to their alphabet contents; then sequences in each set are deposited by the deposition algorithm. The DDA* algorithm differs from the DDA algorithm in that the DDA* algorithm distributes sequences by clustering based on sequence profiles. Experiments show that DDA and DDA* always output results with smaller costs than other algorithms, and DDA* outperforms DDA in most instances. The DDA and DDA* algorithms are also efficient both in time and space.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

The Distribution and Deposition Algorithm for Multiple Sequences Sets does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with The Distribution and Deposition Algorithm for Multiple Sequences Sets, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and The Distribution and Deposition Algorithm for Multiple Sequences Sets will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-213813

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.