Content deleted Content added
Adding empirical research results to the problem with perceptual hashes. |
→Characteristics: wikilink Adobe Stock (stock photos library) |
||
(21 intermediate revisions by 18 users not shown) | |||
Line 1:
{{Short description|Class of fingerprinting algorithm}}
'''Perceptual hashing''' is the use of a [[fingerprint (computing)|fingerprinting algorithm]] that produces a snippet, [[hash function|hash]], or [[Fingerprint (computing)|fingerprint]] of various forms of [[multimedia]].<ref name=buldas13>{{cite book|last1=Buldas|first1=Ahto|last2=Kroonmaa|first2=Andres|last3=Laanoja|first3=Risto|editor-last=Riis|editor-first=Nielson H.|editor-last2=Gollmann|editor-first2=D.|title=Secure IT Systems. NordSec 2013|chapter=Keyless Signatures’ Infrastructure: How to Build Global Distributed Hash-Trees|publisher=Springer|___location=Berlin, Heidelberg|year=2013|isbn=978-3-642-41487-9
==Development==
The 1980 work of [[Marr–Hildreth algorithm|Marr and Hildreth]] is a seminal paper in this field.<ref name=marr80>{{Cite journal |title=Theory of Edge Detection |first1=D. |last1=Marr |author1-link=David Marr (neuroscientist) |first2=E. |last2=Hildreth |author2-link=Ellen Hildreth |journal=Proceedings of the Royal Society of London. Series B, Biological Sciences |volume=207 |number=1167 |date=29 Feb 1980 |pages=187–217 |doi=10.1098/rspb.1980.0020|pmid=6102765 |bibcode=1980RSPSB.207..187M |s2cid=2150419 }}</ref>
In 2009, [[Microsoft Corporation]] developed [[PhotoDNA]] in collaboration with [[Hany Farid]], professor at [[Dartmouth College]]. PhotoDNA is a perceptual hashing capability developed to combat the distribution of [[child sexual abuse material]] (CSAM) online. Provided by Microsoft for no cost, PhotoDNA remains a critical tool used by major software companies, NGOs and law enforcement agencies around the world. <ref name="nytpdna">{{cite news |last1=Lohr |first1=Steve |title=Microsoft Tackles the Child Pornography Problem |date= December 2009 |work= New York Times |url=https://archive.nytimes.com/bits.blogs.nytimes.com/2009/12/16/microsoft-tackles-the-child-pornography-problem/}}</ref>
In June 2016 Azadeh Amir Asgari published work on robust image hash spoofing. Asgari notes that perceptual hash function like any other algorithm is prone to errors.<ref name="asgari16">{{cite book |last1=Asgari |first1=Azadeh Amir |title=Robust image hash spoofing |date=June 2016 |publisher=Blekinge Institute of Technology |url=http://www.diva-portal.se/smash/get/diva2:946365/FULLTEXT01.pdf}}</ref>
Researchers remarked in December 2017 that [[Google image search]] is based on a perceptual hash.<ref name="agis">{{cite news |title=Google Image Search Explained |url=https://alibaba-cloud.medium.com/google-image-search-explained-30af8ba9cbea |publisher=Medium |date=26 December 2017}}</ref>
In research published in November 2021 investigators focused on a manipulated image of [[Stacey Abrams]] which was published to the internet prior to her loss in the [[2018 Georgia gubernatorial election]]. They found that the pHash algorithm was vulnerable to nefarious actors.<ref name="hao21">{{cite
In August 2021 Apple announced an on-device CSAM scanner called NeuralHash but, after strong privacy backlash, paused the rollout in September and formally cancelled it in December 2022.<ref name="wired2022">{{cite magazine |last=Newman |first=Lily Hay |title=Apple Kills Its Plan to Scan Your Photos for CSAM. Here's What's Next |url=https://www.wired.com/story/apple-photo-scanning-csam-communication-safety-messages/ |magazine=Wired |date=7 December 2022 |access-date=27 May 2025}}</ref>
Security researchers soon demonstrated that NeuralHash and similar deep perceptual hashes can be forced into collisions or evasion with imperceptible image changes.<ref name="struppek22">{{cite conference |last1=Struppek |first1=Lukas |last2=Hintersdorf |first2=Dominik |last3=Neider |first3=Daniel |last4=Kersting |first4=Kristian |title=Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash |book-title=Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22) |publisher=ACM |year=2022 |doi=10.1145/3531146.3533073|arxiv=2111.06628 }}</ref>
In October 2023 [[Meta Platforms|Meta]] introduced Stable Signature, an invisible watermark rooted in latent-diffusion generators, signalling a shift toward hybrid provenance schemes that combine watermarking with perceptual hashing.<ref name="meta2023">{{cite web |title=Stable Signature: A New Method for Watermarking Images Created by Generative AI |url=https://ai.meta.com/blog/stable-signature-watermarking-generative-ai/ |website=Meta AI Blog |date=20 October 2023 |access-date=27 May 2025}}</ref>
The open-source state of the art in 2025 was set by DINOHash, which adversarially fine-tunes self-supervised DINOv2 features and reports higher bit-accuracy under heavy crops, compression and adversarial gradient-based attacks than NeuralHash or classical DCT–DWT schemes.<ref name="dinohash25">{{cite arXiv |title=Provenance Detection for AI-Generated Images: Combining Perceptual Hashing, Homomorphic Encryption, and AI Detection Models |last1=Singhi |first1=Shree |last2=Yadav |first2=Aayan |last5=Gupta |first5=Aayush |last4=Ebrahimi |first4=Shariar |last3=Hassanizadeh |first3=Parisa |eprint=2503.11195 |year=2025|class=cs.CV }}</ref>
==Characteristics==
Line 17 ⟶ 27:
Research reported in January 2019 at [[Northumbria University]] has shown for video it can be used to simultaneously identify similar contents for [[video copy detection]] and detect malicious manipulations for video authentication. The system proposed performs better than current [[video hash]]ing techniques in terms of both identification and authentication. <ref name=khelifi19>{{cite journal |last1=Khelifi |first1=Fouad |last2=Bouridane |first2=Ahmed |title=Perceptual Video Hashing for Content Identification and Authentication |journal=IEEE Transactions on Circuits and Systems for Video Technology |date=January 2019 |volume=29 |issue=1 |pages=50–67 |doi=10.1109/TCSVT.2017.2776159 |s2cid=55725934 |url=http://nrl.northumbria.ac.uk/32873/1/paper_double.pdf }}</ref>
Research reported in May 2020 by the [[University of Houston]] in deep learning based perceptual hashing for audio has shown better performance than traditional [[audio fingerprinting]] methods for the detection of similar/copied audio subject to transformations.<ref name=bs20>{{Cite journal |last1=Báez-Suárez |first1=Abraham |last2=Shah |first2=Nolan |last3=Nolazco-Flores |first3=Juan Arturo |last4=Huang |first4=Shou-Hsuan S. |last5=Gnawali |first5=Omprakash |last6=Shi |first6=Weidong |date=2020-05-19 |title=SAMAF: Sequence-to-sequence Autoencoder Model for Audio Fingerprinting
In addition to its uses in digital forensics, research by a Russian group reported in 2019 has shown that perceptual hashing can be applied to a wide variety of situations. Similar to comparing images for copyright infringement, the group found that it could be used to compare and match images in a database. Their proposed algorithm proved to be not only effective, but more efficient than the standard means of database image searching.<ref name=zak19>{{cite book |last1=Zakharov |first1=Victor |last2=Kirikova |first2=Anastasia |last3=Munerman |first3=Victor |last4=Samoilova |first4=Tatyana |title=2019 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EICon ''Rus'') |chapter=Architecture of Software-Hardware Complex for Searching Images in Database |pages=1735–1739 |publisher=IEEE |doi=10.1109/EIConRus.2019.8657241 |isbn=978-1-7281-0339-6 |year=2019 |s2cid=71152337 }}</ref>
Line 23 ⟶ 33:
A Chinese team reported in July 2019 that they had discovered a perceptual hash for [[speech encryption]] which proved to be effective. They were able to create a system in which the encryption was not only more accurate, but more compact as well.<ref name=zhang19>{{cite journal |last1=Zhang |first1=Qiu-yu |last2=Zhou |first2=Liang |last3=Zhang |first3=Tao |last4=Zhang |first4=Deng-hai |title=A retrieval algorithm of encrypted speech based on short-term cross-correlation and perceptual hashing |journal=Multimedia Tools and Applications |date=July 2019 |volume=78 |issue=13 |pages=17825–17846 |doi=10.1007/s11042-019-7180-9 |s2cid=58010160 }}</ref>
[[Apple Inc]] reported as early as August 2021 a [[
In an essay entitled "The Problem With Perceptual Hashes", Oliver Kuederle produces a startling collision generated by a piece of commercial [[neural net]] software, of the NeuralHash type. A photographic portrait of a real woman ([[Adobe Stock]] #221271979) reduces through the test algorithm to
Researchers have continued to publish a comprehensive analysis entitled "Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash", in which they investigate the vulnerability of NeuralHash as a representative of deep perceptual hashing algorithms to various attacks. Their results show that hash collisions between different images can be achieved with minor changes applied to the images. According to the authors, these results demonstrate the real chance of such attacks and enable the flagging and possible prosecution of innocent users. They also state that the detection of illegal material can easily be avoided, and the system be outsmarted by simple image transformations, such as provided by free-to-use image editors. The authors assume their results to apply to other deep perceptual hashing algorithms as well, questioning their overall effectiveness and functionality in applications such as [[client-side scanning]] and chat controls.<ref>{{cite
==See also==
Line 45 ⟶ 49:
==External links==
* [
* [
* [
[[Category:Hashing]]
[[Category:Google Search|Images]]
[[Category:Image search]]
|