Towards a theoretical understanding of false positives in DNA motif finding

Biology – Quantitative Biology – Genomics

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Submitted to PLOS Computational Biology

Scientific paper

Detection of false-positive motifs is one of the main causes of low performance in motif finding methods. It is generally assumed that false-positives are mostly due to algorithmic weakness of motif-finders. Here, however, we derive the theoretical dependence of false positives on dataset size and find that false positives can arise as a result of large dataset size, irrespective of the algorithm used. Interestingly, the false-positive strength depends more on the number of sequences in the dataset than it does on the sequence length. As expected, false-positives can be reduced by decreasing the sequence length or by adding more sequences to the dataset. The dependence on number of sequences, however, diminishes and reaches a plateau after which adding more sequences to the dataset does not reduce the false-positive rate significantly. Based on the theoretical results presented here, we provide a number of intuitive rules of thumb that may be used to enhance motif-finding results in practice.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Towards a theoretical understanding of false positives in DNA motif finding does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Towards a theoretical understanding of false positives in DNA motif finding, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Towards a theoretical understanding of false positives in DNA motif finding will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-582405

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.