Sampling to estimate arbitrary subset sums

Computer Science – Data Structures and Algorithms

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Scientific paper

Starting with a set of weighted items, we want to create a generic sample of a certain size that we can later use to estimate the total weight of arbitrary subsets. For this purpose, we propose priority sampling which tested on Internet data performed better than previous methods by orders of magnitude. Priority sampling is simple to define and implement: we consider a steam of items i=0,...,n-1 with weights w_i. For each item i, we generate a random number r_i in (0,1) and create a priority q_i=w_i/r_i. The sample S consists of the k highest priority items. Let t be the (k+1)th highest priority. Each sampled item i in S gets a weight estimate W_i=max{w_i,t}, while non-sampled items get weight estimate W_i=0. Magically, it turns out that the weight estimates are unbiased, that is, E[W_i]=w_i, and by linearity of expectation, we get unbiased estimators over any subset sum simply by adding the sampled weight estimates from the subset. Also, we can estimate the variance of the estimates, and surpricingly, there is no co-variance between different weight estimates W_i and W_j. We conjecture an extremely strong near-optimality; namely that for any weight sequence, there exists no specialized scheme for sampling k items with unbiased estimators that gets smaller total variance than priority sampling with k+1 items. Very recently Mario Szegedy has settled this conjecture.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Sampling to estimate arbitrary subset sums does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Sampling to estimate arbitrary subset sums, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Sampling to estimate arbitrary subset sums will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-395015

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.