How to do Statistics and Machine Learning on Very Large Survey Datasets

Statistics – Computation

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Scientific paper

I'll describe algorithms and data structures for allowing the most powerful machine learning methods, which often scale quadratically or even cubically with the number of data points, to be performed many orders of magnitude faster than naive implementations. Such techniques can make previously impossible statistical analyses tractable on the scale of entire sky surveys. I will discuss scalable algorithms we have developed for n-point correlations, friends-of-friends, nearest-neighbors, kernel density estimation, nonparametric Bayes classification, principal component analysis, local linear regression, isometric non-negative matrix factorization, hidden Markov models, k-means, support vector machine-like classifiers, Gaussian process regression, and Gaussian graphical model inference, among others. In addition to techniques inspired by computational geometry, fast multipole methods, and Monte Carlo integration, we employ a distributed framework which can be thought of as a higher-order version of Google's MapReduce. Our algorithms have enabled several first-of-a-kind large-scale analyses by our collaborators in astrophysics as well as other fields.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

How to do Statistics and Machine Learning on Very Large Survey Datasets does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with How to do Statistics and Machine Learning on Very Large Survey Datasets, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and How to do Statistics and Machine Learning on Very Large Survey Datasets will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-1696538

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.