Perceptual hashing: Difference between revisions

Content deleted Content added
Alter: doi, template type. Add: work, isbn, pages, date, title, chapter, authors 1-4. Removed parameters. Some additions/deletions were parameter name changes. | Use this tool. Report bugs. | #UCB_Gadget
DivideBy0 (talk | contribs)
Development: Added more recent work in perceptual hashing
Line 5:
The 1980 work of [[Marr–Hildreth algorithm|Marr and Hildreth]] is a seminal paper in this field.<ref name=marr80>{{Cite journal |title=Theory of Edge Detection |first1=D. |last1=Marr |author1-link=David Marr (neuroscientist) |first2=E. |last2=Hildreth |author2-link=Ellen Hildreth |journal=Proceedings of the Royal Society of London. Series B, Biological Sciences |volume=207 |number=1167 |date=29 Feb 1980 |pages=187–217 |doi=10.1098/rspb.1980.0020|pmid=6102765 |bibcode=1980RSPSB.207..187M |s2cid=2150419 }}</ref>
 
In 2009, [[Microsoft Corporation]] developed [[PhotoDNA]] in collaboration with [[Hany Farid]], professor at [[Dartmouth College]]. PhotoDNA is a perceptual hashing capability developed to combat the distribution of [[child sexual abuse material]] (CSAM) online. Provided by Microsoft for no cost, PhotoDNA remains a critical tool used by major software companies, NGOs and law enforcement agencies around the world. <ref name="nytpdna">{{cite news |last1=Lohr |first1=Steve |title=Microsoft Tackles the Child Pornography Problem |date= December 2009 |work= New York Times |url=https://archive.nytimes.com/bits.blogs.nytimes.com/2009/12/16/microsoft-tackles-the-child-pornography-problem/}}</ref>
In 2009, [[Microsoft Corporation]] developed [[PhotoDNA]] in collaboration with [[Hany Farid]], professor at [[Dartmouth College]].
PhotoDNA is a perceptual hashing capability developed to combat the distribution of [[child sexual abuse material]] (CSAM) online. Provided by Microsoft for no cost, PhotoDNA remains a critical tool used by major software companies, NGOs and law enforcement agencies around the world. <ref name="nytpdna">{{cite news |last1=Lohr |first1=Steve |title=Microsoft Tackles the Child Pornography Problem |date= December 2009 |work= New York Times |url=https://archive.nytimes.com/bits.blogs.nytimes.com/2009/12/16/microsoft-tackles-the-child-pornography-problem/}}</ref>
 
The July 2010 thesis of Christoph Zauner is a well-written introduction to the topic.<ref name="zauner10">{{cite book |last1=Zauner |first1=Christoph |title=Implementation and Benchmarking of Perceptual Image Hash Functions |date= July 2010 |publisher=Upper Austria University of Applied Sciences, Hagenberg Campus |url=https://www.phash.org/docs/pubs/thesis_zauner.pdf}}</ref>
Line 15 ⟶ 14:
 
In research published in November 2021 investigators focused on a manipulated image of [[Stacey Abrams]] which was published to the internet prior to her loss in the [[2018 Georgia gubernatorial election]]. They found that the pHash algorithm was vulnerable to nefarious actors.<ref name="hao21">{{cite book |chapter-url=https://gangw.cs.illinois.edu/PHashing.pdf |doi=10.1145/3460120.3484559 |chapter=It's Not What It Looks Like: Manipulating Perceptual Hashing based Applications |title=Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security |date=2021 |last1=Hao |first1=Qingying |last2=Luo |first2=Licheng |last3=Jan |first3=Steve T.K. |last4=Wang |first4=Gang |pages=69–85 |isbn=978-1-4503-8454-4 }}</ref>
 
In August 2021 Apple announced an on-device CSAM scanner called NeuralHash but, after strong privacy backlash, paused the rollout in September and formally cancelled it in December 2022.<ref name="wired2022">{{cite news |last=Newman |first=Lily Hay |title=Apple Kills Its Plan to Scan Your Photos for CSAM. Here's What's Next |url=https://www.wired.com/story/apple-photo-scanning-csam-communication-safety-messages/ |work=Wired |date=7 December 2022 |access-date=27 May 2025}}</ref>
 
Security researchers soon demonstrated that NeuralHash and similar deep perceptual hashes can be forced into collisions or evasion with imperceptible image changes.<ref name="struppek22">{{cite conference |last1=Struppek |first1=Lukas |last2=Hintersdorf |first2=Dominik |last3=Neider |first3=Daniel |last4=Kersting |first4=Kristian |title=Learning to Break Deep Perceptual Hashing: The Use Case NeuralHash |book-title=Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22) |publisher=ACM |year=2022 |doi=10.1145/3531146.3533073}}</ref>
 
In October 2023 [[Meta Platforms|Meta]] introduced Stable Signature, an invisible watermark rooted in latent-diffusion generators, signalling a shift toward hybrid provenance schemes that combine watermarking with perceptual hashing.<ref name="meta2023">{{cite web |title=Stable Signature: A New Method for Watermarking Images Created by Generative AI |url=https://ai.meta.com/blog/stable-signature-watermarking-generative-ai/ |website=Meta AI Blog |date=20 October 2023 |access-date=27 May 2025}}</ref>
 
The open-source state of the art in 2025 was set by DINOHash, which adversarially fine-tunes self-supervised DINOv2 features and reports higher bit-accuracy under heavy crops, compression and adversarial gradient-based attacks than NeuralHash or classical DCT–DWT schemes.<ref name="dinohash25">{{cite arxiv |title=Provenance Detection for AI-Generated Images: Combining Perceptual Hashing, Homomorphic Encryption, and AI Detection Models |last1=Singhi |first1=Shree |last2=Yadav |first2=Aayan |last5=Gupta |first5=Aayush |last4=Ebrahimi |first4=Shariar |last3=Hassanizadeh |first3=Parisa |arxiv=2503.11195 |year=2025}}</ref>
 
==Characteristics==