Computer Science – Data Structures and Algorithms
Scientific paper
2011-02-17
Computer Science
Data Structures and Algorithms
10 pages + appendix. 15 pages total
Scientific paper
In a recent paper from SODA11 \cite{kminwise} the authors introduced a general framework for exponential time improvement of \minwise based algorithms by defining and constructing almost \kmin independent family of hash functions. Here we take it a step forward and reduce the space and the independent needed for representing the functions, by defining and constructing a \dkmin independent family of hash functions. Surprisingly, for most cases only 8-wise independent is needed for exponential time and space improvement. Moreover, we bypass the $O(\log{\frac{1}{\epsilon}})$ independent lower bound for approximately \minwise functions \cite{patrascu10kwise-lb}, as we use alternative definition. In addition, as the independent's degree is a small constant it can be implemented efficiently. Informally, under this definition, all subsets of size $d$ of any fixed set $X$ have an equal probability to have hash values among the minimal $k$ values in $X$, where the probability is over the random choice of hash function from the family. This property measures the randomness of the family, as choosing a truly random function, obviously, satisfies the definition for $d=k=|X|$. We define and give an efficient time and space construction of approximately \dkmin independent family of hash functions. The degree of independent required is optimal, i.e. only $O(d)$ for $2 \le d < k=O(\frac{d}{\epsilon^2})$, where $\epsilon \in (0,1)$ is the desired error bound. This construction can be used to improve many \minwise based algorithms, such as \cite{sizeEstimationFramework,Datar02estimatingrarity,NearDuplicate,SimilaritySearch,DBLP:conf/podc/CohenK07}, as will be discussed here. To our knowledge such definitions, for hash functions, were never studied and no construction was given before.
Feigenblat Guy
Porat Ely
Shiftan Ariel
No associations
LandOfFree
Even Better Framework for min-wise Based Algorithms does not yet have a rating. At this time, there are no reviews or comments for this scientific paper.
If you have personal experience with Even Better Framework for min-wise Based Algorithms, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Even Better Framework for min-wise Based Algorithms will most certainly appreciate the feedback.
Profile ID: LFWR-SCP-O-380367