Towards "Intelligent Compression" in Streams: A Biased Reservoir Sampling based Bloom Filter Approach

Computer Science – Information Retrieval

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

11 pages, 8 figures, 5 tables

Scientific paper

With the explosion of information stored world-wide,data intensive computing has become a central area of research.Efficient management and processing of this massively exponential amount of data from diverse sources,such as telecommunication call data records,online transaction records,etc.,has become a necessity.Removing redundancy from such huge(multi-billion records) datasets resulting in resource and compute efficiency for downstream processing constitutes an important area of study. "Intelligent compression" or deduplication in streaming scenarios,for precise identification and elimination of duplicates from the unbounded datastream is a greater challenge given the realtime nature of data arrival.Stable Bloom Filters(SBF) address this problem to a certain extent.However,SBF suffers from a high false negative rate(FNR) and slow convergence rate,thereby rendering it inefficient for applications with low FNR tolerance.In this paper, we present a novel Reservoir Sampling based Bloom Filter,(RSBF) data structure,based on the combined concepts of reservoir sampling and Bloom filters for approximate detection of duplicates in data streams.Using detailed theoretical analysis we prove analytical bounds on its false positive rate(FPR),false negative rate(FNR) and convergence rates with low memory requirements.We show that RSBF offers the currently lowest FN and convergence rates,and are better than those of SBF while using the same memory.Using empirical analysis on real-world datasets(3 million records) and synthetic datasets with around 1 billion records,we demonstrate upto 2x improvement in FNR with better convergence rates as compared to SBF,while exhibiting comparable FPR.To the best of our knowledge,this is the first attempt to integrate reservoir sampling method with Bloom filters for deduplication in streaming scenarios.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Towards "Intelligent Compression" in Streams: A Biased Reservoir Sampling based Bloom Filter Approach does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Towards "Intelligent Compression" in Streams: A Biased Reservoir Sampling based Bloom Filter Approach, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Towards "Intelligent Compression" in Streams: A Biased Reservoir Sampling based Bloom Filter Approach will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-701515

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.