Biology – Quantitative Biology – Genomics
Scientific paper
2010-12-22
Biology
Quantitative Biology
Genomics
Submitted to PLOS Computational Biology
Scientific paper
Detection of false-positive motifs is one of the main causes of low performance in motif finding methods. It is generally assumed that false-positives are mostly due to algorithmic weakness of motif-finders. Here, however, we derive the theoretical dependence of false positives on dataset size and find that false positives can arise as a result of large dataset size, irrespective of the algorithm used. Interestingly, the false-positive strength depends more on the number of sequences in the dataset than it does on the sequence length. As expected, false-positives can be reduced by decreasing the sequence length or by adding more sequences to the dataset. The dependence on number of sequences, however, diminishes and reaches a plateau after which adding more sequences to the dataset does not reduce the false-positive rate significantly. Based on the theoretical results presented here, we provide a number of intuitive rules of thumb that may be used to enhance motif-finding results in practice.
Moses Alan M.
Zia Amin
No associations
LandOfFree
Towards a theoretical understanding of false positives in DNA motif finding does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Towards a theoretical understanding of false positives in DNA motif finding, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Towards a theoretical understanding of false positives in DNA motif finding will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-582405