Statistical methodology for massive datasets and model selection

Computer Science – Performance

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Scientific paper

Astronomy is facing a revolution in data collection, storage, analysis, and interpretation of large datasets. The data volumes here are several orders of magnitude larger than what astronomers and statisticians are used to dealing with, and the old methods simply do not work. The National Virtual Observatory (NVO) initiative has recently emerged in recognition of this need and to federate numerous large digital sky archives, both ground based and space based, and develop tools to explore and understand these vast volumes of data. In this paper, we address some of the critically important statistical challenges raised by the NVO. In particular a low-storage, single-pass, sequential method for simultaneous estimation of multiple quantiles for massive datasets will be presented. Density estimation based on this procedure and a multivariate extension will also be discussed. The NVO also requires statistical tools to analyze moderate size databases. Model selection is an important issue for many astrophysical databases. We present a simple likelihood based 'leave one out' method to select the best among the several possible alternatives. The performance of the method is compared to those based on Akaike Information Criterion and Bayesian Information Criterion.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Statistical methodology for massive datasets and model selection does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Statistical methodology for massive datasets and model selection, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Statistical methodology for massive datasets and model selection will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-1310941

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.