Content deleted Content added
Tags: Reverted Mobile edit Mobile web edit |
→External links: Updated URL for SEI website |
||
(56 intermediate revisions by 29 users not shown) | |||
Line 1:
{{short description|Subset of characters in Unicode}}
{{More citations needed|date=June 2024}}
{{about|writing systems found in Unicode|the "Script" style of Latin letters in Unicode|Mathematical Alphanumeric Symbols|and|Script typeface}}
<!-- [[File:Armenian language in the Armenian alphabet.svg|thumb|[[Armenian script]]]] -->
{{ISO 15924/unicode-script-illustration}}
In [[Unicode]], a '''script''' is a collection of [[Letter (alphabet)|letter]]s and other written signs used to represent textual information in one or more [[writing system]]s.<ref>{{cite web|url=http://unicode.org/glossary/|title=Glossary|website=unicode.org}}</ref> Some scripts support
The unified [[Combining Diacritical Marks for Symbols|diacritical character]]s and unified [[General Punctuation|punctuation characters]] frequently have the "common" or "inherited" script property. However, the individual scripts often have their own [[punctuation]] and [[diacritic]]s, so that many scripts include not only letters
Unicode
== Definition and classification ==
When multiple languages make use of the same script, there are frequently some differences
=== Script versus writing system ===
Most writing systems can be broadly divided into several categories: '''logographic''', '''syllabic''', '''alphabetic''' (or '''segmental'''), '''abugida''', '''abjad''' and '''featural'''; however, all features of any of these may be found in any given writing system in varying proportions, often making it difficult to purely categorize a system. The term ''[[complex system]]'' is sometimes used to describe those where the admixture makes classification problematic.
Unicode supports all of these types of writing systems through its numerous scripts. Unicode also adds further properties to characters to help differentiate the various characters and the ways they behave within Unicode text
=== {{anchor|Common and
In addition to explicit or specific script properties, Unicode uses three special values:<ref name=Unicode_script_property>{{cite web|url=https://www.unicode.org/reports/tr24/|title=UAX #24: Unicode Script Property|website=www.unicode.org}}</ref>
;Common: Unicode can assign a character in the [[Universal Character Set|UCS]] to a single script only. However, many
;Inherited: Many diacritics and non-spacing combining characters may be applied to characters from more than one script. In these cases Unicode assigns them to the "inherited" script (ISO 15924 code Zinh), which means that they have the same script class as the base character with which they combine, and so in different contexts they may be treated as belonging to different scripts. For example, {{unichar|0308|Combining Diaeresis|cwith=}} may combine
;Unknown: The value of "unknown" script (ISO 15924 code Zzzz) is given to unassigned, private
== Character categories within scripts ==
Unicode provides a general category property for each character. So in addition to belonging to a script every character also has a general category. Typically scripts include letter characters including: uppercase letters, lowercase
Most writing systems do not differentiate between uppercase and lowercase letters. For those scripts all letters are categorized as "other letter" or "modifier letter". Ideographs such as Unihan ideographs are also categorized as "other letters". A few scripts do differentiate between uppercase and lowercase however: Latin, Cyrillic, Greek, Armenian, Georgian, and Deseret. Even for these scripts there are some letters that are neither uppercase nor lowercase.
Line 35 ⟶ 36:
Scripts can also contain any other general category character such as '''marks''' (diacritic and otherwise), '''numbers''' (numerals), '''punctuation''', '''separators''' (word separators such as spaces), '''symbols''' and non-graphical '''format''' characters. These are included in a particular script when they are unique to that script. Other such characters are generally unified and included in the punctuation or diacritic blocks. However, the bulk of characters in any script (other than the common and inherited scripts) are letters.
== <span class="anchor" id="List of scripts in Unicode"></span> List of encoded scripts ==
{{As of|September 2024|alt=As of version 16.0}}, Unicode defines 168 scripts (called "Alias" or "Property value alias") based on the ISO 15924 list. In addition, Unicode assigns the name "Common" to ISO 15924's {{code|Zyyy}} code for undetermined scripts, "Inherited" to ISO 15924's {{code|Zinh}} code for inherited scripts, and "Unknown" to ISO 15924's {{code|Zzzz}} code for uncoded scripts. There are script codes defined by ISO 15924 but are not used in Unicode, including {{code|Zsym}} (Symbols) and {{code|Zmth}} (Mathematical notation).
{{ISO 15924 script codes and related Unicode data|state=uncollapsed}}
== Missing scripts in Unicode ==
The project Missing Scripts—with contributors from the [[Mainz University of Applied Sciences]], the L’Atelier national de recherche typographique (ANRT) in [[Nancy, France|Nancy]], and the [[University of California, Berkeley]]—has compiled a list of 131 scripts that have not yet been encoded in ''The Unicode Standard'', out of a total of 294 recognized scripts according to the current state of research.<ref>{{Cite web |title=The World's Writing Systems |url=https://www.worldswritingsystems.org/ |access-date=2024-10-04 |website=www.worldswritingsystems.org}}</ref>
==See also==
Line 48 ⟶ 51:
==References==
{{Reflist}}
==External links==
* [https://sei.berkeley.edu/ Script Encoding Initiative], A project at UC Berkeley, USA, working to get more scripts included in the Unicode standard.
* [https://www.worldswritingsystems.org The World’s Writing Systems], An overview of all 294 known writing systems, each with a typographic reference glyph and their Unicode status.
{{Unicode navigation}}
|