Locality-sensitive hashing: Difference between revisions

Content deleted Content added
Adding local short description: "Algorithmic technique using hashing", overriding Wikidata description "method of dimension reduction in which closer items have greater probability of being mapped to the same hash bucket"
Add clarification for parameters of bit sampling methd
Line 134:
===Bit sampling for Hamming distance===
 
One of the easiest ways to construct an LSH family is by bit sampling.<ref name=IndykMotwani98 /> This approach works for the [[Hamming distance]] over {{mvar|d}}-dimensional vectors <math>\{0,1\}^d</math>. Here, the family <math>\mathcal F</math> of hash functions is simply the family of all the projections of points on one of the <math>d</math> coordinates, i.e., <math>{\mathcal F}=\{h\colon \{0,1\}^d\to \{0,1\}\mid h(x)=x_i \text{ for some } i\in \{1, \ldots, d\}\}</math>, where <math>x_i</math> is the <math>i</math>th coordinate of <math>x</math>. A random function <math>h</math> from <math>{\mathcal F}</math> simply selects a random bit from the input point. This family has the following parameters: <math>P_1=1-R/d</math>, <math>P_2=1-cR/d</math>.{{Clarify|date=February 2018}}
That is, any two vectors <math>x,y</math> with Hamming distance at most <math>R</math> collide under a random <math>h</math> with probability at least <math>P_1</math>.
Any <math>x,y</math> with Hamming distance at least <math>cR</math> collide with probability at most <math>P_2</math>.
 
===Min-wise independent permutations===