Unicode: Difference between revisions

Content deleted Content added
Sususi (talk | contribs)
No edit summary
No edit summary
Line 19:
'''Unicode''' (also known as '''''The Unicode Standard''''' and '''TUS'''<ref>{{Cite web|date=27 March 2002 |title=Unicode Technical Report #28: Unicode 3.2 |url=https://www.unicode.org/reports/tr28/tr28-3.html#errata |access-date=23 June 2022 |website=Unicode Consortium}}</ref><ref>{{Cite web |last=Jenkins |first=John H. |date=26 August 2021 |title=Unicode Standard Annex #45: U-source Ideographs |url=https://www.unicode.org/reports/tr45/tr45-25.html |access-date=23 June 2022 |website=Unicode Consortium |at=§2.2 The Source Field}}</ref>) is a [[character encoding]] standard maintained by the [[Unicode Consortium]] designed to support the use of text in all of the world's [[writing system]]s that can be digitized. Version 16.0{{efn-ua|name=standard-latest}} defines 154,998 [[Character (computing)|characters]] and 168 [[script (Unicode)|scripts]]<ref>{{multiref |<!-- Graphic + Format count is used here -->{{Cite web|url=https://www.unicode.org/versions/stats/charcountv16_0.html|title=Unicode Character Count V16.0 |date=10 September 2024 |publisher=The Unicode Consortium}} | {{Cite web|title=Unicode 16.0 Versioned Charts Index|url=https://www.unicode.org/charts/PDF/Unicode-16.0/ |publisher=The Unicode Consortium |date=10 September 2024}} | {{Cite web |title=Supported Scripts |url=https://www.unicode.org/standard/supported.html |access-date=11 September 2024 |date=10 September 2024 |publisher=The Unicode Consortium}} }}</ref> used in various ordinary, literary, academic, and technical contexts.
 
Unicode has largely supplanted the previous environment of a myriad of incompatible [[character sets]] used within different locales and on different computer architectures. The entire repertoire of these sets, plus many additional characters, were merged into the single Unicode set. Unicode is used to encode the vast majority of text on the Internet, including most [[web pages]], and relevant Unicode support has become a common consideration in contemporary software development. Unicode is ultimately capable of encoding more than 1.1 million characters.
 
The Unicode [[character repertoire]] is synchronized with [[Universal Coded Character Set|ISO/IEC 10646]], each being code-for-code identical with one another. However, ''The Unicode Standard'' is more than just a repertoire within which characters are assigned. To aid developers and designers, the standard also provides charts and reference data, as well as annexes explaining concepts germane to various scripts, providing guidance for their implementation. Topics covered by these annexes include [[Unicode equivalence#Normalization|character normalization]], [[Combining character|character composition]] and decomposition, [[Unicode collation algorithm|collation]], and [[Bidirectional text#Unicode bidi support|directionality]].<ref>{{Cite web |title=The Unicode Standard: A Technical Introduction |url=https://www.unicode.org/standard/principles.html |date=22 August 2019 |access-date=11 September 2024}}</ref>