Content deleted Content added
→Codespace and code points: Move reason for "U+" into a {{refn}} note |
|||
(8 intermediate revisions by 6 users not shown) | |||
Line 75:
[[File:Unicode sample.png|class=skin-invert-image|thumb|right|200px|Many modern applications can render a substantial subset of the many [[scripts in Unicode]], as demonstrated by this screenshot from the [[OpenOffice.org]] application.]]<!-- screenshot fair use rationale: this screenshot is used specifically to illustrate the Unicode-related capabilities of modern desktop applications and the breadth of supported Unicode scripts -->
{{As of|September 2024}}, a total of 168<ref>{{Cite web |title=Supported Scripts |url=https://www.unicode.org/standard/supported.html |access-date=16 September 2022 |website=Unicode}}</ref>
=== Proposals for adding scripts ===
The Unicode Roadmap Committee ([[Michael Everson]], Rick McGowan, Ken Whistler, V.S. Umamaheswaran)<ref>{{Cite web |title=Roadmap to the BMP |url=https://www.unicode.org/roadmaps/bmp/ |access-date=30 July 2018 |publisher=[[Unicode Consortium]]}}</ref> maintain the list of scripts that are candidates or potential candidates for encoding and their tentative code block assignments on the Unicode Roadmap<ref>{{Cite web|url=https://www.unicode.org/roadmaps/|title=Roadmaps to Unicode|website=Unicode |url-status=live |archive-url= https://web.archive.org/web/20231208091250/http://www.unicode.org/roadmaps/ |archive-date= Dec 8, 2023 }}</ref> page of the [[Unicode Consortium]] website. For some scripts on the Roadmap, such as [[Jurchen script|Jurchen]] and [[Khitan large script]], encoding proposals have been made and they are working their way through the approval process. For other scripts, such as [[Numidian language|Numidian]] and [[Rongorongo]], no proposal has yet been made, and they await agreement on character repertoire and other details from the user communities involved.
Line 85:
There is also a [[Medieval Unicode Font Initiative]] focused on special Latin medieval characters. Part of these proposals has been already included in Unicode.
The Script Encoding Initiative (SEI),<ref>{{Cite web|url=https://sei.berkeley.edu/ |title=Script Encoding Initiative |website=Script Encoding Initiative |url-status=live |archive-url=https://web.archive.org/web/20230325131114/https://linguistics.berkeley.edu/sei/ |archive-date= Mar 25, 2023 }}</ref> a project created by Deborah Anderson at the [[University of California, Berkeley]], was founded in 2002 with the goal of funding proposals for scripts not yet encoded in the standard. Now run by Anushah Hossain, SEI has become a major source of proposed additions to the standard in recent years.<ref>{{Cite web |title=About The Script Encoding Initiative |url=https://www.unicode.org/pending/about-sei.html |access-date=4 June 2012 |publisher=The Unicode Consortium}}</ref> Although SEI collaborates with the Unicode Consortium and the ISO/IEC 10646 standards process, it operates independently, supporting the technical, linguistic, and historical research needed to prepare formal proposals. SEI maintains a database of scripts that have yet to be encoded in the Unicode Standard on the project's website.<ref>{{Cite web |title=Scripts to Encode |url=https://sei.berkeley.edu/scripts-to-encode/ }}</ref>
Line 846:
=== Security<span class="anchor" id="Security issues"></span> ===
Unicode has a large number of [[homoglyphs]], many of which look very similar or identical to ASCII letters. Substitution of these can make an identifier or URL that looks correct, but directs to a different ___location than expected.<ref>{{Cite web |title=UTR #36: Unicode Security Considerations |url=https://unicode.org/reports/tr36/ |website=Unicode}}</ref> Additionally, homoglyphs can also be used for manipulating the output of [[NLP (computer science)|natural language processing (NLP)]] systems.<ref>{{Cite book |last1=Boucher |first1=Nicholas |last2=Shumailov |first2=Ilia |last3=Anderson |first3=Ross |last4=Papernot |first4=Nicolas |title=2022 IEEE Symposium on Security and Privacy (SP) |chapter=Bad Characters: Imperceptible NLP Attacks |year=2022
A security advisory was released in 2021 by two researchers, one from the [[University of Cambridge]] and the other from the [[University of Edinburgh]], in which they assert that the [[Bidirectional Text#Bidirectional text#Explicit formatting|BiDi marks]] can be used to make large sections of code do something different from what they appear to do. The problem was named "[[Trojan Source]]".<ref>{{Cite web |first1=Nicholas |last1=Boucher |first2=Ross |last2=Anderson |title=Trojan Source: Invisible Vulnerabilities |url=https://www.trojansource.codes/trojan-source.pdf |access-date=2 November 2021}}</ref> In response, code editors started highlighting marks to indicate forced text-direction changes.<ref>{{Cite web |title=Visual Studio Code October 2021 |url=https://code.visualstudio.com/updates/v1_62#_unicode-directional-formatting-characters |access-date=11 November 2021 |website=code.visualstudio.com |language=en}}</ref>
|