Bayesian Locality Sensitive Hashing for Fast Similarity Search

Computer Science – Databases

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

13 pages, 5 Tables, 21 figures. Added acknowledgments in v3. A slightly shorter version of this paper without the appendix has

Scientific paper

Given a collection of objects and an associated similarity measure, the all-pairs similarity search problem asks us to find all pairs of objects with similarity greater than a certain user-specified threshold. Locality-sensitive hashing (LSH) based methods have become a very popular approach for this problem. However, most such methods only use LSH for the first phase of similarity search - i.e. efficient indexing for candidate generation. In this paper, we present BayesLSH, a principled Bayesian algorithm for the subsequent phase of similarity search - performing candidate pruning and similarity estimation using LSH. A simpler variant, BayesLSH-Lite, which calculates similarities exactly, is also presented. BayesLSH is able to quickly prune away a large majority of the false positive candidate pairs, leading to significant speedups over baseline approaches. For BayesLSH, we also provide probabilistic guarantees on the quality of the output, both in terms of accuracy and recall. Finally, the quality of BayesLSH's output can be easily tuned and does not require any manual setting of the number of hashes to use for similarity estimation, unlike standard approaches. For two state-of-the-art candidate generation algorithms, AllPairs and LSH, BayesLSH enables significant speedups, typically in the range 2x-20x for a wide variety of datasets.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Bayesian Locality Sensitive Hashing for Fast Similarity Search does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Bayesian Locality Sensitive Hashing for Fast Similarity Search, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Bayesian Locality Sensitive Hashing for Fast Similarity Search will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-181992

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.