Mathematics – Probability
Scientific paper
2012-04-04
Mathematics
Probability
Scientific paper
Consider two random strings having the same length and generated by an iid sequence taking its values uniformly in a finite alphabet. Artificially place a long block into one of the strings, where a block is a contiguous substring consisting only of one type of symbol. The long block replaces a segment of equal size and its length is smaller than the length of the strings, but larger than its square-root. We show that for sufficiently long strings the optimal alignment corresponding to the Longest Common Subsequence (LCS) treats the added long block very differently depending on the size of the alphabet. For two-letter alphabets, the long block gets mainly aligned with the same symbol from the other string, while for three or more letters the opposite is true and the long block gets mainly aligned with gaps. We further provide simulation results on the proportion of gaps in blocks of various lengths. In our simulations, the blocks are "regular blocks" in an iid sequence, and are not artificially added. Nonetheless, we observe a similar phenomenon for the natural blocks as the one shown for the artificially-added blocks: with two letters, the longer blocks get aligned with a smaller proportion of gaps. For three or more letters, the opposite is true. It thus appears that the microscopic nature of two-letter optimal alignments and three-letter optimal alignments are entirely different from each other.
Amsalu S.
Houdré Christian
Matzinger Heinrich
No associations
LandOfFree
Sparse long blocks and the microstructure of the LCS does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Sparse long blocks and the microstructure of the LCS, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Sparse long blocks and the microstructure of the LCS will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-32644