Statistics – Applications
Scientific paper
2011-01-05
Annals of Applied Statistics 2010, Vol. 4, No. 4, 1660-1697
Statistics
Applications
Published in at http://dx.doi.org/10.1214/10-AOAS363 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Ins
Scientific paper
10.1214/10-AOAS363
Large-scale statistical analysis of data sets associated with genome sequences plays an important role in modern biology. A key component of such statistical analyses is the computation of $p$-values and confidence bounds for statistics defined on the genome. Currently such computation is commonly achieved through ad hoc simulation measures. The method of randomization, which is at the heart of these simulation procedures, can significantly affect the resulting statistical conclusions. Most simulation schemes introduce a variety of hidden assumptions regarding the nature of the randomness in the data, resulting in a failure to capture biologically meaningful relationships. To address the need for a method of assessing the significance of observations within large scale genomic studies, where there often exists a complex dependency structure between observations, we propose a unified solution built upon a data subsampling approach. We propose a piecewise stationary model for genome sequences and show that the subsampling approach gives correct answers under this model. We illustrate the method on three simulation studies and two real data examples.
Bickel Peter J.
Boley Nathan
Brown James B.
Huang Haiyan
Zhang Nancy R.
No associations
LandOfFree
Subsampling Methods for genomic inference does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Subsampling Methods for genomic inference, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Subsampling Methods for genomic inference will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-266834