Hashing Algorithms for Large-Scale Learning

Statistics – Machine Learning

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Scientific paper

In this paper, we first demonstrate that b-bit minwise hashing, whose estimators are positive definite kernels, can be naturally integrated with learning algorithms such as SVM and logistic regression. We adopt a simple scheme to transform the nonlinear (resemblance) kernel into linear (inner product) kernel; and hence large-scale problems can be solved extremely efficiently. Our method provides a simple effective solution to large-scale learning in massive and extremely high-dimensional datasets, especially when data do not fit in memory. We then compare b-bit minwise hashing with the Vowpal Wabbit (VW) algorithm (which is related the Count-Min (CM) sketch). Interestingly, VW has the same variances as random projections. Our theoretical and empirical comparisons illustrate that usually $b$-bit minwise hashing is significantly more accurate (at the same storage) than VW (and random projections) in binary data. Furthermore, $b$-bit minwise hashing can be combined with VW to achieve further improvements in terms of training speed, especially when $b$ is large.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Hashing Algorithms for Large-Scale Learning does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Hashing Algorithms for Large-Scale Learning, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Hashing Algorithms for Large-Scale Learning will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-389080

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.