Content deleted Content added
→top: Use normal wikipedia wording |
→top: Rearranged a bit and added note about UTF-16 |
||
Line 19:
'''Unicode''' or '''''The Unicode Standard''''' or '''TUS'''<ref>{{Cite web|date=27 March 2002 |title=Unicode Technical Report #28: Unicode 3.2 |url=https://www.unicode.org/reports/tr28/tr28-3.html#errata |access-date=23 June 2022 |website=Unicode Consortium}}</ref><ref>{{Cite web |last=Jenkins |first=John H. |date=26 August 2021 |title=Unicode Standard Annex #45: U-source Ideographs |url=https://www.unicode.org/reports/tr45/tr45-25.html |access-date=23 June 2022 |website=Unicode Consortium |at=§2.2 The Source Field}}</ref> is a [[character encoding]] standard maintained by the [[Unicode Consortium]] designed to support the use of text in all of the world's [[writing system]]s that can be digitized. Version 16.0{{efn-ua|name=standard-latest}} defines 154,998 [[Character (computing)|characters]] and 168 [[script (Unicode)|scripts]]<ref>{{multiref |<!-- Graphic + Format count is used here -->{{Cite web|url=https://www.unicode.org/versions/stats/charcountv16_0.html|title=Unicode Character Count V16.0 |date=10 September 2024 |publisher=The Unicode Consortium}} | {{Cite web|title=Unicode 16.0 Versioned Charts Index|url=https://www.unicode.org/charts/PDF/Unicode-16.0/ |publisher=The Unicode Consortium |date=10 September 2024}} | {{Cite web |title=Supported Scripts |url=https://www.unicode.org/standard/supported.html |access-date=11 September 2024 |date=10 September 2024 |publisher=The Unicode Consortium}} }}</ref> used in various ordinary, literary, academic, and technical contexts.
Unicode has largely supplanted the previous environment of a myriad of incompatible [[character sets]]
Many common characters, including numerals, punctuation, and other symbols, are unified within the standard and are not treated as specific to any given writing system. Unicode encodes 3,790 [[emoji]], with the continued development thereof conducted by the Consortium as a part of the standard.<ref>{{Cite web |title=Emoji Counts, v16.0 |url=https://www.unicode.org/emoji/charts-16.0/emoji-counts.html |access-date=10 September 2024 |publisher=The Unicode Consortium}}</ref> Moreover, the widespread adoption of Unicode was in large part responsible for the initial popularization of emoji outside of Japan.{{citation needed|date=June 2025}} Unicode is ultimately capable of encoding more than 1.1 million characters.▼
▲Unicode has largely supplanted the previous environment of a myriad of incompatible [[character sets]], each used within different locales and on different computer architectures. Unicode is used to encode the vast majority of text on the Internet, including most [[web pages]], and relevant Unicode support has become a common consideration in contemporary software development.
The Unicode [[character repertoire]] is synchronized with [[Universal Coded Character Set|ISO/IEC 10646]], each being code-for-code identical with one another. However, ''The Unicode Standard'' is more than just a repertoire within which characters are assigned. To aid developers and designers, the standard also provides charts and reference data, as well as annexes explaining concepts germane to various scripts, providing guidance for their implementation. Topics covered by these annexes include [[Unicode equivalence#Normalization|character normalization]], [[Combining character|character composition]] and decomposition, [[Unicode collation algorithm|collation]], and [[Bidirectional text#Unicode bidi support|directionality]].<ref>{{Cite web |title=The Unicode Standard: A Technical Introduction |url=https://www.unicode.org/standard/principles.html |date=22 August 2019 |access-date=11 September 2024}}</ref>
▲
Unicode text is processed and stored as binary data [[comparison of Unicode encodings|using one of several encodings]], which define how to translate the standard's abstracted codes for characters into sequences of bytes. ''The Unicode Standard'' itself defines three encodings: [[UTF-8]], [[UTF-16]], and [[UTF-32]], though several others exist. Of these, UTF-8 is the most widely used by a large margin, in part due to its backwards-compatibility with [[ASCII]].▼
▲Unicode text is processed and stored as binary data [[comparison of Unicode encodings|using one of several encodings]], which define how to translate the standard's abstracted codes for characters into sequences of bytes. ''The Unicode Standard'' itself defines three encodings: [[UTF-8]], [[UTF-16]],{{efn|A large amount of documentation for Windows incorrectly uses the term "Unicode" to mean ''only'' the UTF-16 encoding.}} and [[UTF-32]], though several others exist.
== Origin and development ==
|