Stream sampling for variance-optimal estimation of subset sums

Computer Science – Data Structures and Algorithms

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

31 pages. An extended abstract appeared in the proceedings of the 20th ACM-SIAM Symposium on Discrete Algorithms (SODA 2009)

Scientific paper

From a high volume stream of weighted items, we want to maintain a generic sample of a certain limited size $k$ that we can later use to estimate the total weight of arbitrary subsets. This is the classic context of on-line reservoir sampling, thinking of the generic sample as a reservoir. We present an efficient reservoir sampling scheme, $\varoptk$, that dominates all previous schemes in terms of estimation quality. $\varoptk$ provides {\em variance optimal unbiased estimation of subset sums}. More precisely, if we have seen $n$ items of the stream, then for {\em any} subset size $m$, our scheme based on $k$ samples minimizes the average variance over all subsets of size $m$. In fact, the optimality is against any off-line scheme with $k$ samples tailored for the concrete set of items seen. In addition to optimal average variance, our scheme provides tighter worst-case bounds on the variance of {\em particular} subsets than previously possible. It is efficient, handling each new item of the stream in $O(\log k)$ time. Finally, it is particularly well suited for combination of samples from different streams in a distributed setting.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Stream sampling for variance-optimal estimation of subset sums does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Stream sampling for variance-optimal estimation of subset sums, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Stream sampling for variance-optimal estimation of subset sums will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-406458

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.