Locality-sensitive hashing: Difference between revisions

Content deleted Content added
Tappoz (talk | contribs)
add interwiki
Line 65:
 
===Min-Wise Independent Permutations===
Suppose <math>U</math> is composed of subsets of some ground set of enumerable items <math>S</math> and the similarity function of interest is the [[Jaccard index| Jaccard index]] <math>J</math>. If <math>\pi</math> is a permutation on the indices of <math>S</math>, for <math>A \subseteq S</math> let <math>h(A) = \min_{a \in A} \{ \pi(a) \}</math>. Each possible choice of <math>\pi</math> defines a single hash function <math>h</math> mapping input sets to integers.
 
Define the function family <math>H</math> to be the set of all such functions and let <math>D</math> be the uniform distribution. Given two sets <math>A,B \subseteq S</math> the event that <math>h(A) = h(B)</math> corresponds exactly to the event that the minimizer of <math>\pi</math> lies inside <math>A \bigcap B</math>. As <math>h</math> was chosen uniformly at random, <math>Pr[h(A) = h(B)] = J(A,B)\,</math> and <math>(H,D)\,</math> define an LSH scheme for the Jaccard index.