Revision as of 05:33, 8 August 2023 edit David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,633 edits restore incorrectly removed subtitle ← Previous edit		Revision as of 06:50, 8 August 2023 edit undo Headbomb (talk \| contribs) Edit filter managers, Autopatrolled, Extended confirmed users, Page movers, File movers, New page reviewers, Pending changes reviewers, Rollbackers, Template editors 473,085 edits m clean up Tag: AWB Next edit →
Line 25: * a threshold <math>R>0</math>, * an approximation factor <math>c>1</math>, * and probabilities <math>P_1</math> and <math>P_2</math>. This family <math>\mathcal F</math> is a set of functions <math>h\colon M \to S</math> that map elements of the metric space to buckets <math>s \in S</math>. An LSH family must satisfy the following conditions for any two points <math>p, q \in M</math> and any hash function <math>h</math> chosen uniformly at random from <math>\mathcal F</math>: Line 52: This is in contrast to [[cryptography\|cryptographic]] hash functions and [[checksum]]s, which are designed to have [[Avalanche effect\|random output difference between adjacent inputs]]. The first family of locality-preserving hash functions was devised as a way to facilitate [[Pipeline (computing)\|data pipelining]] in implementations of [[Parallel RAM\|parallel random-access machine (PRAM)]] algorithms that use [[universal hashing]] to reduce memory [[~~Resource_contention~~Resource contention\|contention]] and [[network congestion]].<ref name=Chin1991>{{cite thesis \|last=Chin \|first=Andrew \|date=1991 \|title=Complexity Issues in General Purpose Parallel Computing \|pages=87–95 \|type=DPhil \|publisher=University of Oxford \|url=https://perma.cc/E47H-WCVP}}</ref><ref name=Chin1994>{{cite journal \|last1=Chin \|first1=Andrew \|date=1994 \|title=Locality-Preserving Hash Functions for General Purpose Parallel Computation \|url=http://unclaw.com/chin/scholarship/hashfunctions.pdf \|journal=Algorithmica \|volume=12 \|issue=2–3 \|pages=170–181 \|doi=10.1007/BF01185209\|s2cid=18108051 }}</ref> Locality preserving hashes are related to [[space-filling curve]]s.{{how\|date=March 2020}} Line 73: \| title = Proceedings of the 16th international conference on World Wide Web \| chapter = Google news personalization: scalable online collaborative filtering ~~\| date = 2007~~ \| pages = 271–280 \| year = 2007\|doi=10.1145/1242572.1242610\|display-authors=etal\| isbn = 9781595936547 Line 88 ⟶ 87: \|title=Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data \|chapter=Twister Tries ~~\|date=2015~~ \| pages=505–517 \| year = 2015 \| doi=10.1145/2723372.2751521\| isbn = 9781450327589 Line 241 ⟶ 239: <math>Pr[h(u) = h(v)] = 1 - \frac{\theta(u,v)}{\pi}</math> <math>\frac{\theta(u,v)}{\pi}</math> is proportional to <math>1-\cos(\theta(u,v))</math>. In fact the ratio is always within a factor of .87856.<ref name=Charikar2002 /><ref name="Goemans Williamson 1995 pp. 1115–1145">{{cite journal \| last1=Goemans \| first1=Michel X. \| last2=Williamson \| first2=David P. \| title=Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming \| journal=Journal of the ACM \| publisher=Association for Computing Machinery (ACM) \| volume=42 \| issue=6 \| year=1995 \| issn=0004-5411 \| doi=10.1145/227683.227684 \| pages=1115–1145\| s2cid=15794408 }}</ref> Meaning the probability of the two vectors being on the same side of the random hyperplane is proportional to the [[~~Cosine_similarity~~Cosine similarity#~~Cosine_Distance~~Cosine Distance\|cosine distance]] between them. ===Stable distributions=== Line 278 ⟶ 276: ===Semantic hashing=== Semantic hashing is a technique that attempts to map input items to addresses such that closer inputs have higher [[semantic similarity]].<ref>{{Cite journal\|last1=Salakhutdinov\|first1=Ruslan\|last2=Hinton\|first2=Geoffrey\|date=2008\|title=Semantic hashing\|journal=International Journal of Approximate Reasoning\|language=en\|volume=50\|issue=7\|pages=969–978\|doi=10.1016/j.ijar.2008.11.006\|doi-access=free}}</ref> The hashcodes are found via training of an [[artificial neural network]] or [[graphical model]].{{cncitation needed\|date=September 2021}} ==Algorithm for nearest neighbor search== Line 333 ⟶ 331: Samet, H. (2006) ''Foundations of Multidimensional and Metric Data Structures''. Morgan Kaufmann. {{ISBN\|0-12-369446-9}} {{Cite conference\| last1 = Indyk \| first1 = Piotr \| author1-link= Piotr Indyk\| last2 = Motwani \| first2 = Rajeev \| author2-link = Rajeev Motwani\| last3 = Raghavan \| first3 = Prabhakar\| last4 = Vempala \| first4 = Santosh \| author4-link = Santosh Vempala\| title = Locality-preserving hashing in multidimensional spaces \| doi = 10.1145/258533.258656 \| book-title = Proceedings of the twenty-ninth annual ACM symposium on Theory of computing\| conference = [[Symposium on Theory of Computing\|STOC '97]]\| pages = 618–625\| year = 1997 \| isbn = 978-0-89791-888-6\| citeseerx = 10.1.1.50.4927\| s2cid = 15693787 }} *{{Cite journal \| last1 = Chin \| first1 = Andrew\| title = Locality-preserving hash functions for general purpose parallel computation \| url = http://www.unclaw.com/chin/scholarship/hashfunctions.pdf\| doi = 10.1007/BF01185209 \| journal = [[Algorithmica]]\| volume = 12 \| issue = 2–3 \| pages = 170–181 \| year = 1994 \| s2cid = 18108051}}

Locality-sensitive hashing: Difference between revisions