Locality-sensitive hashing: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Alter: issue, pages. Add: pages, website, s2cid, authors 1-1. Removed proxy/dead URL that duplicated identifier. Removed parameters. Formatted dashes. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by SemperIocundus | #UCB_webform 1353/2500
Line 51:
This is in contrast to [[cryptography|cryptographic]] hash functions and [[checksum]]s, which are designed to have [[Avalanche effect|random output difference between adjacent inputs]].
 
The first family of locality-preserving hash functions was devised as a way to facilitate [[Pipeline (computing)|data pipelining]] in implementations of [[Parallel RAM|parallel random-access machine (PRAM)]] algorithms that use [[universal hashing]] to reduce memory [[Resource_contention|contention]] and [[network congestion]].<ref name=Chin1991>{{cite thesis |last=Chin |first=Andrew |date=1991 |title=Complexity Issues in General Purpose Parallel Computing |pages=87-9587–95 |type=DPhil |publisher=University of Oxford |url=https://perma.cc/E47H-WCVP}}</ref><ref name=Chin1994>{{cite journal |last1=Chin |first1=Andrew |date=1994 |title=Locality-Preserving Hash Functions for General Purpose Parallel Computation |url=http://unclaw.com/chin/scholarship/hashfunctions.pdf |journal=Algorithmica |volume=12 |issue=2-32–3 |pages=170-181170–181 |doi=10.1007/BF01185209|s2cid=18108051 }}</ref>
 
Locality preserving hashes are related to [[space-filling curve]]s.{{how|date=March 2020}}
Line 123:
| journal = 4th Cybercrime and Trustworthy Computing Workshop
| year = 2013
| pages = 7–13 | url = https://www.academia.edu/7833902
| doi = 10.1109/CTC.2013.9
| isbn = 978-1-4799-3076-0
Line 206:
Testing performed in the paper on a range of file types identified the Nilsimsa hash as having a significantly higher false positive rate when compared to other similarity digest schemes such as TLSH, Ssdeep and Sdhash.
 
An implementation of TLSH is available as [[open-source software]].<ref>{{cite web|url=https://github.com/trendmicro/tlsh |title=TLSH |website=[[GitHub]] |access-date=2014-04-10}}</ref>
 
===Random projection===
Line 268:
 
===Semantic hashing===
Semantic hashing is a technique that attempts to map input items to addresses such that closer inputs have higher [[semantic similarity]].<ref>{{Cite journal|lastlast1=Salakhutdinov|firstfirst1=Ruslan|last2=Hinton|first2=Geoffrey|date=2008|title=Semantic hashing|url=https://linkinghub.elsevier.com/retrieve/pii/S0888613X08001813|journal=International Journal of Approximate Reasoning|language=en|volume=50|issue=7|pages=969–978|doi=10.1016/j.ijar.2008.11.006|doi-access=free}}</ref> The hashcodes are found via training of an [[artificial neural network]] or [[graphical model]].{{cn|date=September 2021}}
 
==LSH algorithm for nearest neighbor search==