Script (Unicode): Difference between revisions

Content deleted Content added
Monkbot (talk | contribs)
m Task 18 (cosmetic): eval 4 templates: del empty params (6×);
Line 2:
{{about|writing systems found in Unicode|the "Script" style of Latin letters in Unicode|Mathematical Alphanumeric Symbols|and|Script typeface}}<blockquote></blockquote>[[File:Armenian language in the Armenian alphabet.svg|thumb|[[Armenian script]]]]
 
In [[Unicode]], a '''script''' is a collection of [[Letter (alphabet)|letter]]s and other written signs used to represent textual information in one or more [[writing system]]s.<ref>{{cite web|url=http://unicode.org/glossary/|title=Glossary|author=|date=|website=unicode.org}}</ref> Some scripts support one and only one writing system and [[Written language|language]], for example, [[Armenian language|Armenian]]. Other scripts support many different writing systems; for example, the [[Latin script in Unicode|Latin script]] supports [[English alphabet|English]], [[French alphabet|French]], [[German alphabet|German]], [[Italian alphabet|Italian]], [[Vietnamese language|Vietnamese]], [[Latin alphabet|Latin]] itself, and several other languages. Some languages make use of multiple alternate writing systems, and thus also use several scripts; for example, in [[Turkish language|Turkish]], the [[Ottoman Turkish alphabet|Arabic]] script was used before the 20th century, but transitioned to Latin in the early part of the 20th century. For a list of languages supported by each script see the [[list of languages by writing system]]. More or less complementary to scripts are [[Unicode symbols|symbols]] and Unicode [[control character]]s.
 
The unified [[Combining Diacritical Marks for Symbols|diacritical character]]s and unified [[General Punctuation|punctuation characters]] frequently have the "common" or "inherited" script property. However, the individual scripts often have their own [[punctuation]] and [[diacritic]]s, so that many scripts include not only letters, but also diacritic and other marks, punctuation, numerals and even their own idiosyncratic symbols and [[Space (punctuation)|space]] characters.
 
Unicode 13.0 defines 154 separate scripts, including 91 modern scripts and 63 ancient or historic scripts.<ref>{{cite web|url=https://www.unicode.org/Public/UNIDATA/Scripts.txt|title=Unicode Character Database: Scripts|author=|date=|website=unicode.org}}</ref><ref>{{cite book | title = The Unicode Standard, Version 6.2 | chapter = Chapter 14: Additional Ancient and Historic Scripts | publisher = Unicode, Inc | date = September 2012 | ___location = Mountain View, CA | pages = 473 | url = https://www.unicode.org/versions/Unicode6.2.0/ch14.pdf | isbn = 978-1-936213-07-8 }}</ref> More scripts are in the process for encoding or have been tentatively allocated for encoding in roadmaps.<ref>https://www.unicode.org/roadmaps/ Roadmaps to Unicode</ref>
 
== Definition and classification ==
Line 21:
 
=== {{anchor|Common and inherited scripts}}{{anchor|Special script property values}}Special script property values ===
In addition to explicit or specific script properties Unicode uses three special values:<ref name=Unicode_script_property>{{cite web|url=https://www.unicode.org/reports/tr24/|title=UAX #24: Unicode Script Property|author=|date=|website=www.unicode.org}}</ref>
;Common: Unicode can assign a character in the [[Universal Character Set|UCS]] to a single script only. However, many characters — those that are not part of a formal natural language writing system or are unified across many writing systems may be used in more than one script. For example, currency signs, symbols, numerals and punctuation marks. In these cases Unicode defines them as belonging to the "common" script ([[ISO 15924]] code "Zyyy").
;Inherited: Many diacritics and non-spacing combining characters may be applied to characters from more than one script. In these cases Unicode assigns them to the "inherited" script (ISO 15924 code Zinh), which means that they have the same script class as the base character with which they combine, and so in different contexts they may be treated as belonging to different scripts. For example, {{unichar|0308|Combining Diaeresis|cwith=}} may combine with either {{unichar|0065|Latin Small Letter E}} to create a Latin "ë", or with {{unichar|0435|Cyrillic Small Letter IE}} for the Cyrillic "ё". In the former case it inherits the Latin script of the base character whereas in the latter case it inherits the Cyrillic script of the base character.