Reference Based Genome Compression

Computer Science – Information Theory

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

5 pages; Submitted to the IEEE Information Theory Workshop (ITW) 2012

Scientific paper

DNA sequencing technology has advanced to a point where storage is becoming the central bottleneck in the acquisition and mining of more data. Large amounts of data are vital for genomics research, and generic compression tools, while viable, cannot offer the same savings as approaches tuned to inherent biological properties. We propose an algorithm to compress a target genome given a known reference genome. The proposed algorithm first generates a mapping from the reference to the target genome, and then compresses this mapping with an entropy coder. As an illustration of the performance: applying our algorithm to James Watson's genome with hg18 as a reference, we are able to reduce the 2991 megabyte (MB) genome down to 6.99 MB, while Gzip compresses it to 834.8 MB.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Reference Based Genome Compression does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Reference Based Genome Compression, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Reference Based Genome Compression will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-652165

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.