Stream sampling for variance-optimal estimation of subset sums

Computer Science – Data Structures and Algorithms

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0


31 pages. An extended abstract appeared in the proceedings of the 20th ACM-SIAM Symposium on Discrete Algorithms (SODA 2009)

Scientific paper

From a high volume stream of weighted items, we want to maintain a generic sample of a certain limited size $k$ that we can later use to estimate the total weight of arbitrary subsets. This is the classic context of on-line reservoir sampling, thinking of the generic sample as a reservoir. We present an efficient reservoir sampling scheme, $\varoptk$, that dominates all previous schemes in terms of estimation quality. $\varoptk$ provides {\em variance optimal unbiased estimation of subset sums}. More precisely, if we have seen $n$ items of the stream, then for {\em any} subset size $m$, our scheme based on $k$ samples minimizes the average variance over all subsets of size $m$. In fact, the optimality is against any off-line scheme with $k$ samples tailored for the concrete set of items seen. In addition to optimal average variance, our scheme provides tighter worst-case bounds on the variance of {\em particular} subsets than previously possible. It is efficient, handling each new item of the stream in $O(\log k)$ time. Finally, it is particularly well suited for combination of samples from different streams in a distributed setting.

No associations


Say what you really think

Search for scientists and scientific papers. Rate them and share your experience with other people.


Stream sampling for variance-optimal estimation of subset sums does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Stream sampling for variance-optimal estimation of subset sums, we encourage you to share that experience with our community. Your opinion is very important and Stream sampling for variance-optimal estimation of subset sums will most certainly appreciate the feedback.

Rate now


Profile ID: LFWR-SCP-O-406458

All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.