Revision as of 13:41, 25 August 2025 edit Warudo (talk \| contribs) Extended confirmed users 9,384 edits Reverted 2 edits by Calga170 (talk): Emoji is an acceptable plural form and country names should not be linked per MOS:OVERLINK Tags: Twinkle Undo ← Previous edit		Revision as of 15:49, 27 August 2025 edit undo Citation bot (talk \| contribs) Bots 5,870,120 edits Removed URL that duplicated identifier. \| Use this bot. Report bugs. \| Suggested by Headbomb \| Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox \| #UCB_webform_linked 778/967 Next edit →
Line 846: === Security<span class="anchor" id="Security issues"></span> === Unicode has a large number of [[homoglyphs]], many of which look very similar or identical to ASCII letters. Substitution of these can make an identifier or URL that looks correct, but directs to a different ___location than expected.<ref>{{Cite web \|title=UTR #36: Unicode Security Considerations \|url=https://unicode.org/reports/tr36/ \|website=Unicode}}</ref> Additionally, homoglyphs can also be used for manipulating the output of [[NLP (computer science)\|natural language processing (NLP)]] systems.<ref>{{Cite book \|last1=Boucher \|first1=Nicholas \|last2=Shumailov \|first2=Ilia \|last3=Anderson \|first3=Ross \|last4=Papernot \|first4=Nicolas \|title=2022 IEEE Symposium on Security and Privacy (SP) \|chapter=Bad Characters: Imperceptible NLP Attacks \|year=2022 ~~\|chapter-url=https://ieeexplore.ieee.org/document/9833641~~ \|___location=San Francisco, CA, US \|publisher=IEEE \|pages=1987–2004 \|arxiv=2106.09898 \|doi=10.1109/SP46214.2022.9833641 \|isbn=978-1-66541-316-9 \|s2cid=235485405}}</ref> Mitigation requires disallowing these characters, displaying them differently, or requiring that they resolve to the same identifier;<ref>{{Cite web \|last=Engineering \|first=Spotify \|date=2013-06-18 \|title=Creative usernames and Spotify account hijacking \|url=https://engineering.atspotify.com/2013/06/creative-usernames/ \|access-date=2023-04-15 \|website=Spotify Engineering \|language=en-US}}</ref> all of this is complicated due to the huge and constantly changing set of characters.<ref>{{cite tech report \| last=Wheeler \| first=David A. \| title=Initial Analysis of Underhanded Source Code \| year=2020 \| jstor=resrep25332.7 \| url=http://www.jstor.org/stable/resrep25332.7 \| page=4–1–4–10}}</ref><ref>{{Cite web \|title=UTR #36: Unicode Security Considerations \|url=https://unicode.org/reports/tr36/ \|access-date=27 June 2022 \|website=Unicode}}</ref> A security advisory was released in 2021 by two researchers, one from the [[University of Cambridge]] and the other from the [[University of Edinburgh]], in which they assert that the [[Bidirectional Text#Bidirectional text#Explicit formatting\|BiDi marks]] can be used to make large sections of code do something different from what they appear to do. The problem was named "[[Trojan Source]]".<ref>{{Cite web \|first1=Nicholas \|last1=Boucher \|first2=Ross \|last2=Anderson \|title=Trojan Source: Invisible Vulnerabilities \|url=https://www.trojansource.codes/trojan-source.pdf \|access-date=2 November 2021}}</ref> In response, code editors started highlighting marks to indicate forced text-direction changes.<ref>{{Cite web \|title=Visual Studio Code October 2021 \|url=https://code.visualstudio.com/updates/v1_62#_unicode-directional-formatting-characters \|access-date=11 November 2021 \|website=code.visualstudio.com \|language=en}}</ref>

Unicode: Difference between revisions