Succinct Data Structures for Assembling Large Genomes

Biology – Quantitative Biology – Genomics

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Scientific paper

Motivation: Second generation sequencing technology makes it feasible for many researches to obtain enough sequence reads to attempt the de novo assembly of higher eukaryotes (including mammals). De novo assembly not only provides a tool for understanding wide scale biological variation, but within human bio-medicine, it offers a direct way of observing both large scale structural variation and fine scale sequence variation. Unfortunately, improvements in the computational feasibility for de novo assembly have not matched the improvements in the gathering of sequence data. This is for two reasons: the inherent computational complexity of the problem, and the in-practice memory requirements of tools. Results: In this paper we use entropy compressed or succinct data structures to create a practical representation of the de Bruijn assembly graph, which requires at least a factor of 10 less storage than the kinds of structures used by deployed methods. In particular we show that when stored succinctly, the de Bruijn assembly graph for homo sapiens requires only 23 gigabytes of storage. Moreover, because our representation is entropy compressed, in the presence of sequencing errors it has better scaling behaviour asymptotically than conventional approaches.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Succinct Data Structures for Assembling Large Genomes does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Succinct Data Structures for Assembling Large Genomes, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Succinct Data Structures for Assembling Large Genomes will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-176413

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.