Even Better Framework for min-wise Based Algorithms

Computer Science – Data Structures and Algorithms

Scientific paper

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

10 pages + appendix. 15 pages total

Scientific paper

In a recent paper from SODA11 \cite{kminwise} the authors introduced a general framework for exponential time improvement of \minwise based algorithms by defining and constructing almost \kmin independent family of hash functions. Here we take it a step forward and reduce the space and the independent needed for representing the functions, by defining and constructing a \dkmin independent family of hash functions. Surprisingly, for most cases only 8-wise independent is needed for exponential time and space improvement. Moreover, we bypass the $O(\log{\frac{1}{\epsilon}})$ independent lower bound for approximately \minwise functions \cite{patrascu10kwise-lb}, as we use alternative definition. In addition, as the independent's degree is a small constant it can be implemented efficiently. Informally, under this definition, all subsets of size $d$ of any fixed set $X$ have an equal probability to have hash values among the minimal $k$ values in $X$, where the probability is over the random choice of hash function from the family. This property measures the randomness of the family, as choosing a truly random function, obviously, satisfies the definition for $d=k=|X|$. We define and give an efficient time and space construction of approximately \dkmin independent family of hash functions. The degree of independent required is optimal, i.e. only $O(d)$ for $2 \le d < k=O(\frac{d}{\epsilon^2})$, where $\epsilon \in (0,1)$ is the desired error bound. This construction can be used to improve many \minwise based algorithms, such as \cite{sizeEstimationFramework,Datar02estimatingrarity,NearDuplicate,SimilaritySearch,DBLP:conf/podc/CohenK07}, as will be discussed here. To our knowledge such definitions, for hash functions, were never studied and no construction was given before.

No associations

LandOfFree

Say what you really think

Search LandOfFree.com for scientists and scientific papers. Rate them and share your experience with other people.

Rating

Even Better Framework for min-wise Based Algorithms does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.

If you have personal experience with Even Better Framework for min-wise Based Algorithms, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Even Better Framework for min-wise Based Algorithms will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFWR-SCP-O-380367

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.