Information Theory of DNA Sequencing

Computer Science – Information Theory

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

33 pages, 10 figures

Scientific paper

DNA sequencing is the basic workhorse of modern day biology and medicine. Shotgun sequencing is the dominant technique used: many randomly located short fragments called reads are extracted from the DNA sequence, and these reads are assembled to reconstruct the original sequence. A basic question is: given a sequencing technology and the statistics of the DNA sequence, what is the minimum number of reads required for reliable reconstruction? This number provides a fundamental limit to the performance of any assembly algorithm. By drawing an analogy between the DNA sequencing problem and the classic communication problem, we formulate this question in terms of an information theoretic notion of sequencing capacity. This is the asymptotic ratio of the length of the DNA sequence to the minimum number of reads required to reconstruct it reliably. We compute the sequencing capacity explicitly for a simple statistical model of the DNA sequence and the read process. Using this framework, we also study the impact of noise in the read process on the sequencing capacity.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Information Theory of DNA Sequencing does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Information Theory of DNA Sequencing, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Information Theory of DNA Sequencing will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-272627

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.