Get the Most out of Your Sample: Optimal Unbiased Estimators using Partial Information

Computer Science – Databases

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

This is a full version of a PODS 2011 paper

Scientific paper

Random sampling is an essential tool in the processing and transmission of data. It is used to summarize data too large to store or manipulate and meet resource constraints on bandwidth or battery power. Estimators that are applied to the sample facilitate fast approximate processing of queries posed over the original data and the value of the sample hinges on the quality of these estimators. Our work targets data sets such as request and traffic logs and sensor measurements, where data is repeatedly collected over multiple {\em instances}: time periods, locations, or snapshots. We are interested in queries that span multiple instances, such as distinct counts and distance measures over selected records. These queries are used for applications ranging from planning to anomaly and change detection. Unbiased low-variance estimators are particularly effective as the relative error decreases with the number of selected record keys. The Horvitz-Thompson estimator, known to minimize variance for sampling with "all or nothing" outcomes (which reveals exacts value or no information on estimated quantity), is not optimal for multi-instance operations for which an outcome may provide partial information. We present a general principled methodology for the derivation of (Pareto) optimal unbiased estimators over sampled instances and aim to understand its potential. We demonstrate significant improvement in estimate accuracy of fundamental queries for common sampling schemes.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Get the Most out of Your Sample: Optimal Unbiased Estimators using Partial Information does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Get the Most out of Your Sample: Optimal Unbiased Estimators using Partial Information, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Get the Most out of Your Sample: Optimal Unbiased Estimators using Partial Information will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-305368

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.