Locality-sensitive hashing: Difference between revisions

Content deleted Content added
Cleaned LSH definition and removed Locality Preserving Hashing. Indeed, the latter lacked citations and was confusing.
Line 7:
 
==Definitions==
AnConsider a family <math> \mathcal F</math> of functions <math>h\colon M \to S</math>. We say that <math> \mathcal F </math> is an ''LSH family''<ref name=MOMD /><ref name=GIM1999>{{cite journal
| author1 = Gionis, A.
| author2-link = Piotr Indyk | author2 = Indyk, P. | author3-link = Rajeev Motwani | author3 = Motwani, R.
Line 23:
| title-link = Symposium on Theory of Computing
}}</ref>
for
<math>\mathcal F</math> is defined for
* a [[metric space]] <math>\mathcal M =(M, d)</math>,
* a threshold <math>Rr>0</math>,
* an approximation factor <math>c>1</math>,
* and probabilities <math>P_1</mathp_1 > and <math>P_2p_2</math>.
 
Thisif familyit <math>\mathcal F</math> is a set of functions <math>h\colon M \to S</math> that map elements ofsatisfies the metricfollowing space to buckets <math>s \in S</math>condition. An LSH family must satisfy the following conditions forFor any two points <math>pa, qb \in M</math> and anya hash function <math>h</math> chosen uniformly at random from <math>\mathcal F</math>:
* ifIf <math>d(pa,qb) \le Rr</math>, then <math>h(pa)=h(qb)</math> (i.e., {{mvar|p}} and {{mvar|q}} collide) with probability at least <math>P_1p_1</math>,
* ifIf <math>d(pa,qb) \ge cRcr</math>, then <math>h(pa)=h(qb)</math> with probability at most <math>P_2p_2</math>.
 
A family is interesting when <math>P_1>P_2</math>. Such a family <math>\mathcal F</math> is called ''<math>(Rr,cRcr,P_1p_1,P_2p_2)</math>-sensitive''.
 
====LSH with respect to a similarity measure====
Alternatively<ref name=Charikar2002>{{cite conference
| last = Charikar|first= Moses S.|author-link=Moses Charikar
Line 42 ⟶ 43:
| pages = 380–388
| doi = 10.1145/509907.509965
| citeseerx = 10.1.1.147.4064}}</ref> it is definedpossible withto respectdefine toan LSH family on a universe of items {{mvar|U}} thatendowed havewith a [[String metric|similarity]] function <math>\phi\colon U \times U \to [0,1]</math>. AnIn this setting, a LSH scheme is a family of [[hash function]]s {{mvar|H}} coupled with a [[probability distribution]] {{mvar|D}} over the functions{{mvar|H}} such that a function <math>h \in H</math> chosen according to {{mvar|D}} satisfies the property that <math>Pr_{h \in H}Pr [h(a) = h(b)] = \phi(a,b)</math> for anyeach <math>a,b \in U</math>.
 
===Locality-preserving hashing===
A '''locality-preserving hash''' is a [[hash function]] {{mvar|f}} that maps points in a [[metric space]] <math>\mathcal M =(M, d)</math> to a scalar value such that
:<math>d(p,q) < d(q,r) \Rightarrow |f(p) - f(q)| < |f(q) - f(r)|</math>
 
for any three points <math>p,q,r \in M</math>.{{citation needed|date=November 2022|reason=unusual definition}}
 
In other words, these are hash functions where the relative distance between the input values is preserved in the relative distance between the output hash values; input values that are closer to each other will produce output hash values that are closer to each other.
 
This is in contrast to [[cryptography|cryptographic]] hash functions and [[checksum]]s, which are designed to have [[Avalanche effect|random output difference between adjacent inputs]].
 
Locality preserving hashes are related to [[space-filling curve]]s.{{how|date=March 2020}}
 
===Amplification===