Content deleted Content added
Astrid224442 (talk | contribs) m added a reference for applications |
m →External links: HTTP to HTTPS for SourceForge |
||
(4 intermediate revisions by 4 users not shown) | |||
Line 1:
{{Short description|Algorithmic technique using hashing}}
In [[computer science]], '''locality-sensitive hashing''' ('''LSH''') is a [[fuzzy hashing]] technique that hashes similar input items into the same "buckets" with high probability.<ref name="MOMD">{{cite web|url=http://infolab.stanford.edu/~ullman/mmds.html|title=Mining of Massive Datasets, Ch. 3.|last1=Rajaraman|first1=A.|last2=Ullman|first2=J.|author2-link=Jeffrey Ullman|year=2010}}</ref>
Hashing-based approximate [[nearest-neighbor search]] algorithms generally use one of two main categories of hashing methods: either data-independent methods, such as locality-sensitive hashing (LSH); or data-dependent methods, such as locality-preserving hashing (LPH).<ref>{{cite conference |last1=Zhao |first1=Kang |last2=Lu |first2=Hongtao |last3=Mei |first3=Jincheng |title=Locality Preserving Hashing |conference=AAAI Conference on Artificial Intelligence | volume=28 | year=2014 |url=https://ojs.aaai.org/index.php/AAAI/article/view/9133/8992 |pages=2874–2880}}</ref><ref>{{cite book |last1=Tsai |first1=Yi-Hsuan |last2=Yang |first2=Ming-Hsuan |title=2014 IEEE International Conference on Image Processing (ICIP) |chapter=Locality preserving hashing |date=October 2014 |pages=2988–2992 |doi=10.1109/ICIP.2014.7025604 |isbn=978-1-4799-5751-4 |s2cid=8024458 |issn=1522-4880}}</ref>
Line 215:
===Random projection===
{{main|Random projection}}
[[File:Cosine-distance.png| thumb | <math>\frac{\theta(u,v)}{\pi}</math> is approximately proportional to <math>1-\cos(\theta(u,v))</math> on the interval [0, <math>\pi</math>]]
The random projection method of LSH due to [[Moses Charikar]]<ref name=Charikar2002 /> called [[SimHash]] (also sometimes called arccos<ref name=Andoni2008>{{cite journal
Line 299:
* space: <math>O(n^{1+\rho}P_1^{-1})</math>, plus the space for storing data points;
* query time: <math>O(n^{\rho}P_1^{-1}(kt+d))</math>;
===Finding nearest neighbor without fixed dimensionality===
To generalize the above algorithm without radius {{mvar|R}} being fixed, we can take the algorithm and do a sort of binary search over {{mvar|R}}. It has been shown<ref>{{cite journal |last1=Har-Peled |first1=Sariel |last2=Indyk |first2=Piotr |last3=Motwani |first3=Rajeev |title=Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality |journal=Theory of Computing |date=2012 |volume=8 |issue=Special Issue in Honor of Rajeev Motwani |pages=321-350 |doi=10.4086/toc.2012.v008a014 |url=https://theoryofcomputing.org/articles/v008a014/v008a014.pdf |access-date=23 May 2025}}</ref> that there is a data structure for the approximate nearest neighbor with the following performance guarantees:
* space: <math>O(n^{1+\rho}P_1^{-1}d\log^2 n)</math>;
* query time: <math>O(n^{\rho}P_1^{-1}(kt+d)\log n)</math>;
* the algorithm succeeds in finding the nearest neighbor with probability at least <math>1 - (( 1 - P_1^k ) ^ L\log n)</math>;
===Improvements===
Line 325 ⟶ 332:
* {{Annotated link |Sparse distributed memory}}
* {{Annotated link |Wavelet compression}}
* {{Annotated link |Locality of reference}}
==References==
Line 338 ⟶ 346:
==External links==
* [http://web.mit.edu/andoni/www/LSH/index.html Alex Andoni's LSH homepage]
* [
* [https://github.com/simonemainardi/LSHash A Python Locality Sensitive Hashing library that optionally supports persistence via redis]
* [https://web.archive.org/web/20101203074412/http://www.vision.caltech.edu/malaa/software/research/image-search/ Caltech Large Scale Image Search Toolbox]: a Matlab toolbox implementing several LSH hash functions, in addition to Kd-Trees, Hierarchical K-Means, and Inverted File search algorithms.
|