Revision as of 16:12, 10 March 2023 edit Liu1126 (talk \| contribs) Extended confirmed users, Pending changes reviewers, Rollbackers 12,270 edits Adding local short description: "Algorithmic technique using hashing", overriding Wikidata description "method of dimension reduction in which closer items have greater probability of being mapped to the same hash bucket" Tag: Shortdesc helper ← Previous edit		Revision as of 19:14, 30 March 2023 edit undo Drmrelephantsr (talk \| contribs) 5 edits Add clarification for parameters of bit sampling methd Next edit →
Line 134: ===Bit sampling for Hamming distance=== One of the easiest ways to construct an LSH family is by bit sampling.<ref name=IndykMotwani98 /> This approach works for the [[Hamming distance]] over {{mvar\|d}}-dimensional vectors <math>\{0,1\}^d</math>. Here, the family <math>\mathcal F</math> of hash functions is simply the family of all the projections of points on one of the <math>d</math> coordinates, i.e., <math>{\mathcal F}=\{h\colon \{0,1\}^d\to \{0,1\}\mid h(x)=x_i \text{ for some } i\in \{1, \ldots, d\}\}</math>, where <math>x_i</math> is the <math>i</math>th coordinate of <math>x</math>. A random function <math>h</math> from <math>{\mathcal F}</math> simply selects a random bit from the input point. This family has the following parameters: <math>P_1=1-R/d</math>, <math>P_2=1-cR/d</math>.~~{{Clarify\|date=February 2018}}~~ That is, any two vectors <math>x,y</math> with Hamming distance at most <math>R</math> collide under a random <math>h</math> with probability at least <math>P_1</math>. Any <math>x,y</math> with Hamming distance at least <math>cR</math> collide with probability at most <math>P_2</math>. ===Min-wise independent permutations===

Locality-sensitive hashing: Difference between revisions