Content deleted Content added
Adding local short description: "Algorithmic technique using hashing", overriding Wikidata description "method of dimension reduction in which closer items have greater probability of being mapped to the same hash bucket" |
Add clarification for parameters of bit sampling methd |
||
Line 134:
===Bit sampling for Hamming distance===
One of the easiest ways to construct an LSH family is by bit sampling.<ref name=IndykMotwani98 /> This approach works for the [[Hamming distance]] over {{mvar|d}}-dimensional vectors <math>\{0,1\}^d</math>. Here, the family <math>\mathcal F</math> of hash functions is simply the family of all the projections of points on one of the <math>d</math> coordinates, i.e., <math>{\mathcal F}=\{h\colon \{0,1\}^d\to \{0,1\}\mid h(x)=x_i \text{ for some } i\in \{1, \ldots, d\}\}</math>, where <math>x_i</math> is the <math>i</math>th coordinate of <math>x</math>. A random function <math>h</math> from <math>{\mathcal F}</math> simply selects a random bit from the input point. This family has the following parameters: <math>P_1=1-R/d</math>, <math>P_2=1-cR/d</math>.
That is, any two vectors <math>x,y</math> with Hamming distance at most <math>R</math> collide under a random <math>h</math> with probability at least <math>P_1</math>.
Any <math>x,y</math> with Hamming distance at least <math>cR</math> collide with probability at most <math>P_2</math>.
===Min-wise independent permutations===
|