Computer Science – Information Theory
Scientific paper
2012-04-09
Computer Science
Information Theory
5 pages; Submitted to the IEEE Information Theory Workshop (ITW) 2012
Scientific paper
DNA sequencing technology has advanced to a point where storage is becoming the central bottleneck in the acquisition and mining of more data. Large amounts of data are vital for genomics research, and generic compression tools, while viable, cannot offer the same savings as approaches tuned to inherent biological properties. We propose an algorithm to compress a target genome given a known reference genome. The proposed algorithm first generates a mapping from the reference to the target genome, and then compresses this mapping with an entropy coder. As an illustration of the performance: applying our algorithm to James Watson's genome with hg18 as a reference, we are able to reduce the 2991 megabyte (MB) genome down to 6.99 MB, while Gzip compresses it to 834.8 MB.
Chern Bobbie
Manolakos Alexandros
No Albert
Ochoa Idoia
Venkat Kartik
No associations
LandOfFree
Reference Based Genome Compression does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Reference Based Genome Compression, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Reference Based Genome Compression will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-652165