Script (Unicode): Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 11:22, 13 May 2010 edit DePiep (talk \| contribs) Extended confirmed users 294,285 edits →References: Unicode 5.0->5.2 ← Previous edit		Latest revision as of 20:09, 13 May 2025 edit undo 2601:645:8003:4e0:64f3:c141:6ed2:4276 (talk) →External links: Updated URL for SEI website
(165 intermediate revisions by 72 users not shown)
Line 1: {{short description\|Subset of characters in Unicode}} ~~{{Unicode_scripts}}~~ {{More citations needed\|date=June 2024}} In [[Unicode]], a '''script''' is a collection of letters and other written signs used to represent textual information in one or more writing systems.<ref>[http://unicode.org/glossary/ Glossary of Unicode Terms]</ref> For example the [[Latin characters in Unicode\|Latin]] script supports alphabets such as: [[English language\|English]], [[French language\|French]], [[Vietnamese language\|Vietnamese]] and many others. Some scripts support one and only one writing system and language, for example: [[Armenian language\|Armenian]]. Other scripts, like [[Latin characters in Unicode\|Latin]], support many different writing systems: [[English alphabet\|English]], [[French alphabet\|French]], [[German alphabet\|German]], [[Italian alphabet\|Italian]], and [[Latin alphabet\|Latin]] to name just some of the alphabets supported by the Latin script. Some languages also make use of multiple alternate writing systems. [[Turkish language\|Turkish]], for example, used [[Ottoman Turkish alphabet\|Arabic]] script before the 20th century and transitioned to Latin in the early part of the 20th century. For a list of languages supported by each script see the [[list of languages by writing system]]. {{about\|writing systems found in Unicode\|the "Script" style of Latin letters in Unicode\|Mathematical Alphanumeric Symbols\|and\|Script typeface}} <!-- [[File:Armenian language in the Armenian alphabet.svg\|thumb\|[[Armenian script]]]] --> {{ISO 15924/unicode-script-illustration}} In [[Unicode]], a '''script''' is a collection of [[Letter (alphabet)\|letter]]s and other written signs used to represent textual information in one or more [[writing system]]s.<ref>{{cite web\|url=http://unicode.org/glossary/\|title=Glossary\|website=unicode.org}}</ref> Some scripts support only one writing system and [[Written language\|language]], for example, [[Armenian language\|Armenian]]. Other scripts support many different writing systems; for example, the [[Latin script in Unicode\|Latin script]] supports [[English alphabet\|English]], [[French alphabet\|French]], [[German alphabet\|German]], [[Italian alphabet\|Italian]], [[Vietnamese language\|Vietnamese]], [[Latin alphabet\|Latin]] itself, and several other languages. Some languages make use of multiple alternate writing systems and thus also use several scripts; for example, in [[Turkish language\|Turkish]], the [[Ottoman Turkish alphabet\|Arabic]] script was used before the 20th century but transitioned to Latin in the early part of the 20th century. More or less complementary to scripts are [[Unicode symbols\|symbols]] and Unicode [[control character]]s. When multiple languages make use of the same script, there are frequently some differences: particularly in diacritics and other marks. For example, Swedish and English both use the Latin script. However, [[Swedish alphabet\|Swedish]] includes the character ‘å’ (sometimes called a “Swedish O”) while English has no such character. Nor does English make use of the diacritic combining circle above for any character. In general the languages sharing the same scripts share many of the same characters. Despite these peripheral differences in the Swedish and English writing systems they are said to use the same Latin script. So the Unicode abstraction of scripts is a basic organizing technique. The differences between different alphabets or writing systems remain and are supported through Unicode’s flexible scripts, combining marks and collation algorithms. The unified [[Combining Diacritical Marks for Symbols\|diacritical character]]s and unified [[General Punctuation\|punctuation characters]] frequently have the "common" or "inherited" script property. However, the individual scripts often have their own [[punctuation]] and [[diacritic]]s, so that many scripts include not only letters but also diacritic and other marks, punctuation, numerals and even their own idiosyncratic symbols and [[Space (punctuation)\|space]] characters. ~~Complemantary are the '''[[Unicode symbols]]''': scripts and symbols cover all Unicode characters.~~ The unified diacritical characters and unified punctuation characters frequently have the “common” or “inherited” script property. However, the individual scripts often have their own punctuation and diacritics. So many scripts include not only letters, but also diacritic and other marks, punctuation, numerals and even their own idiosyncratic symbols and space characters. Unicode {{Unicode version\|version=16.0}} defines 168 separate scripts, including 99 modern scripts and 69 ancient or historic scripts.<ref>{{cite web\|url=https://www.unicode.org/Public/UNIDATA/Scripts.txt\|title=Unicode Character Database: Scripts\|website=unicode.org}}</ref><ref>{{cite book \| title = The Unicode Standard, Version 15.0 \| chapter = Chapter 14: Additional Ancient and Historic Scripts \| publisher = Unicode, Inc \| date = September 2022 \| ___location = Mountain View, CA \| url = https://www.unicode.org/versions/Unicode15.0.0/ch14.pdf \| isbn = 978-1-936213-32-0 }}</ref> More scripts are in the process for encoding or have been tentatively allocated for encoding in roadmaps.<ref>https://www.unicode.org/roadmaps/ Roadmaps to Unicode</ref> Unicode 5.2 includes 90 modern and historic scripts supporting hundreds or even thousands of languages throughout the World. Unicode is actively working on many more as indicated by its [[Unicode#Unicode roadmap\|roadmap]]. == Definition and classification == ~~== Writing system ==~~ When multiple languages make use of the same script, there are frequently some differences, particularly in diacritics and other marks. For example, Swedish and English both use the Latin script. However, [[Swedish alphabet\|Swedish]] includes the character ''[[å]]'' (sometimes called a Swedish ''O''), while English has no such character. Nor does English make use of the diacritic ''[[Ring (diacritic)#Overring\|combining ring above]]'' for any character. In general, the languages sharing the same scripts share many of the same characters. Despite these peripheral differences in the Swedish and English writing systems, they are said to use the same Latin script. Thus, the Unicode abstraction of scripts is a basic organizing technique. The differences among different alphabets or writing systems remain and are supported through Unicode’s flexible scripts, combining marks and collation algorithms. ~~{{main\|Writing system}}~~ === Script versus writing system === '''Writing system''' is sometimes treated as a synonym for script. However it also can be used as the specific concrete writing system supported by a script. For example the Vietnamese writing system is supported by the Latin script. A writing system may also cover more than one script, for example the Japanese writing system makes use of the [[Kanji\|Han]], [[Hiragana]] and [[Katakana]] scripts. ''[[Writing system]]'' is sometimes treated as a synonym for "script". However, it also can be used as the specific concrete writing system supported by a script. For example, the [[Vietnamese alphabet\|Vietnamese writing system]] is supported by the Latin script. A writing system may also cover more than one script; for example, the Japanese writing system makes use of the [[Kanji\|Han]], [[Hiragana]] and [[Katakana]] scripts. Most writing systems can be broadly divided into several categories: '''logographic''', '''syllabic''', '''alphabetic''' (or '''segmental'''), '''abugida''', '''abjad''' and '''featural'''; however, all features of any of these may be found in any given writing system in varying proportions, often making it difficult to purely categorize a system. The term '''complex system''' is sometimes used to describe those where the admixture makes classification problematic. Most writing systems can be broadly divided into several categories: '''logographic''', '''syllabic''', '''alphabetic''' (or '''segmental'''), '''abugida''', '''abjad''' and '''featural'''; however, all features of any of these may be found in any given writing system in varying proportions, often making it difficult to purely categorize a system. The term ''[[complex system]]'' is sometimes used to describe those where the admixture makes classification problematic. ~~{\| class="wikitable"~~ ~~! Type of writing system !! What each symbol represents !! Example~~ \|- ~~\| [[Logogram\|Logographic]] \|\| [[morpheme]] \|\| [[Chinese character]]s~~ \|- ~~\| [[Syllabary\|Syllabic]] \|\| syllable \|\| Japanese ''[[kana]]''~~ \|- ~~\| [[Alphabet]]ic \|\| [[phoneme]] (consonant or vowel) \|\| [[Latin alphabet]]~~ \|- ~~\| [[Abugida]] \|\| phoneme (consonant+vowel) \|\| Indian ''[[Devanāgarī]]''~~ \|- ~~\| [[Abjad]] \|\| phoneme (consonant) \|\| [[Arabic alphabet]]~~ \|- ~~\| [[Featural alphabet\|Featural]] \|\| phonetic feature \|\| Korean ''[[hangul]]''~~ \|} Unicode supports all of these types of writing systems through its numerous scripts. Unicode also adds further properties to characters to help differentiate the various characters and the ways they behave within Unicode text-processing algorithms. ~~''See also'': [[phonemic orthography\|''phonemic'' and ''phonetic'' orthography]].~~ === {{anchor\|Common and inherited scripts}}{{anchor\|Special script property values}}Special script property values === Unicode supports all of these types of writing systems through its numerous scripts. Unicode also adds further properties to characters to help differentiate the various characters and the ways they behave within Unicode text processing algorithms. In addition to explicit or specific script properties, Unicode uses three special values:<ref name=Unicode_script_property>{{cite web\|url=https://www.unicode.org/reports/tr24/\|title=UAX #24: Unicode Script Property\|website=www.unicode.org}}</ref> ;Common: Unicode can assign a character in the [[Universal Character Set\|UCS]] to a single script only. However, many characters—those that are not part of a formal natural-language writing system or are unified across many writing systems—may be used in more than one script (for example, currency signs, symbols, numerals and punctuation marks). In these cases Unicode defines them as belonging to the "common" script ([[ISO 15924]] code "Zyyy"). ;Inherited: Many diacritics and non-spacing combining characters may be applied to characters from more than one script. In these cases Unicode assigns them to the "inherited" script (ISO 15924 code Zinh), which means that they have the same script class as the base character with which they combine, and so in different contexts they may be treated as belonging to different scripts. For example, {{unichar\|0308\|Combining Diaeresis\|cwith=}} may combine either with {{unichar\|0065\|Latin Small Letter E}} to create a Latin ''ë'' or with {{unichar\|0435\|Cyrillic Small Letter IE}} for the [[Cyrillic alphabet\|Cyrillic]] ''ё''. In the former case, it inherits the Latin script of the base character, whereas in the latter case, it inherits the Cyrillic script of the base character. ;Unknown: The value of "unknown" script (ISO 15924 code Zzzz) is given to unassigned, private-use, noncharacter, and surrogate code points. == ~~Table~~Character ofcategories ~~Unicode~~within scripts == Unicode provides a general category property for each character. So in addition to belonging to a script every character also has a general category. Typically scripts include letter characters including: uppercase letters, lowercase letters and modifier letters. Some characters are considered titlecase letters for a few [[Precomposed character\|precomposed]] ligatures such as ǲ (U+01F2). Such titlecase ligatures are all in the Latin and Greek scripts and are all [[Unicode compatibility characters \|compatibility characters]], and therefore Unicode discourages their use by authors. It is unlikely that new titlecase letters will be added in the future. Most writing systems do not differentiate between uppercase and lowercase letters. For those scripts all letters are categorized as "other letter" or "modifier letter". Ideographs such as Unihan ideographs are also categorized as "other letters". A few scripts do differentiate between uppercase and lowercase however: Latin, Cyrillic, Greek, Armenian, Georgian, and Deseret. Even for these scripts there are some letters that are neither uppercase nor lowercase. ~~The following table lists the 90 scripts that are defined in Unicode 5.2.<ref>[http://www.unicode.org/Public/UNIDATA/Scripts.txt Unicode Character Database : Scripts]</ref>~~ Scripts can also contain any other general category character such as '''marks''' (diacritic and otherwise), '''numbers''' (numerals), '''punctuation''', '''separators''' (word separators such as spaces), '''symbols''' and non-graphical '''format''' characters. These are included in a particular script when they are unique to that script. Other such characters are generally unified and included in the punctuation or diacritic blocks. However, the bulk of characters in any script (other than the common and inherited scripts) are letters. ~~{\| class="wikitable sortable"~~ \|- ~~! Unicode script name~~ ~~! Relevant Wikipedia article(s)~~ ~~! [[ISO 15924]] code<ref>[http://www.unicode.org/iso15924/ ISO 15924 Registration Authority]</ref>~~ ~~! Number of characters (as of Unicode 5.2)~~ ~~! Version of Unicode first encoded~~ \|- ~~\| Common~~ \| ~~\| Zyyy~~ ~~\| 5,395~~ \| \|- ~~\| Inherited~~ \| ~~\| Qaai~~ ~~\| 523~~ \| \|- ~~\| Arabic~~ ~~\| [[Arabic alphabet]]~~ ~~\| Arab~~ ~~\| 1,030~~ ~~\| 1.0~~ \|- ~~\| Armenian~~ ~~\| [[Armenian alphabet]]~~ ~~\| Armn~~ ~~\| 90~~ ~~\| 1.0~~ \|- ~~\| Avestan~~ ~~\| [[Avestan alphabet]]~~ ~~\| Avst~~ ~~\| 61~~ ~~\| 5.2~~ \|- ~~\| Balinese~~ ~~\| [[Balinese script]]~~ ~~\| Bali~~ ~~\| 121~~ ~~\| 5.0~~ \|- ~~\| Bamum~~ ~~\| [[Bamum language]]~~ ~~\| Bamu~~ ~~\| 88~~ ~~\| 5.2~~ \|- ~~\| Bengali~~ ~~\| [[Bengali script]]~~ ~~\| Beng~~ ~~\| 92~~ ~~\| 1.0~~ \|- ~~\| Bopomofo~~ ~~\| [[Zhuyin]]~~ ~~\| Bopo~~ ~~\| 65~~ ~~\| 1.0~~ \|- ~~\| Braille~~ ~~\| [[Braille]]~~ ~~\| Brai~~ ~~\| 256~~ ~~\| 3.0~~ \|- ~~\| Buginese~~ ~~\| [[Lontara script]]~~ ~~\| Bugi~~ ~~\| 30~~ ~~\| 4.1~~ \|- ~~\| Buhid~~ ~~\| [[Buhid script]]~~ ~~\| Buhd~~ ~~\| 20~~ ~~\| 3.2~~ \|- ~~\| Canadian Aboriginal~~ ~~\| [[Canadian Aboriginal syllabics]]~~ ~~\| Cans~~ ~~\| 710~~ ~~\| 3.0~~ \|- ~~\| Carian~~ ~~\| [[Carian script]]~~ ~~\| Cari~~ ~~\| 49~~ ~~\| 5.1~~ \|- ~~\| Cham~~ ~~\| [[Cham alphabet]]~~ ~~\| Cham~~ ~~\| 83~~ ~~\| 5.1~~ \|- ~~\| Cherokee~~ ~~\| [[Cherokee syllabary]]~~ ~~\| Cher~~ ~~\| 85~~ ~~\| 3.0~~ \|- ~~\| Coptic~~ ~~\| [[Coptic alphabet]]~~ ~~\| Copt~~ ~~\| 135~~ ~~\| 1.0 (disunified from Greek in 4.1)~~ \|- ~~\| Cuneiform~~ ~~\| [[Cuneiform script]]~~ ~~\| Xsux~~ ~~\| 982~~ ~~\| 5.0~~ \|- ~~\| Cypriot~~ ~~\| [[Cypriot syllabary]]~~ ~~\| Cprt~~ ~~\| 55~~ ~~\| 4.0~~ \|- ~~\| Cyrillic~~ ~~\| [[Cyrillic alphabet]]~~ ~~\| Cyrl~~ ~~\| 404~~ ~~\| 1.0~~ \|- ~~\| Deseret~~ ~~\| [[Deseret alphabet]]~~ ~~\| Dsrt~~ ~~\| 80~~ ~~\| 3.1~~ \|- ~~\| Devanagari~~ ~~\| [[Devanagari script]]~~ ~~\| Deva~~ ~~\| 140~~ ~~\| 1.0~~ \|- ~~\| Egyptian Hieroglyphs~~ ~~\| [[Egyptian hieroglyphs]]~~ ~~\| Egyp~~ ~~\| 1,071~~ ~~\| 5.2~~ \|- ~~\| Ethiopic~~ ~~\| [[Ge'ez alphabet]]~~ ~~\| Ethi~~ ~~\| 461~~ ~~\| 3.0~~ \|- ~~\| Georgian~~ ~~\| [[Georgian alphabet]]~~ ~~\| Geor~~ ~~\| 120~~ ~~\| 1.0~~ \|- ~~\| Glagolitic~~ ~~\| [[Glagolitic alphabet]]~~ ~~\| Glag~~ ~~\| 94~~ ~~\| 4.1~~ \|- ~~\| Gothic~~ ~~\| [[Gothic alphabet]]~~ ~~\| Goth~~ ~~\| 27~~ ~~\| 3.1~~ \|- ~~\| Greek~~ ~~\| [[Greek alphabet]]~~ ~~\| Grek~~ ~~\| 511~~ ~~\| 1.0~~ \|- ~~\| Gujarati~~ ~~\| [[Gujarati script]]~~ ~~\| Gujr~~ ~~\| 83~~ ~~\| 1.0~~ \|- ~~\| Gurmukhi~~ ~~\| [[Gurmukhi script]]~~ ~~\| Guru~~ ~~\| 79~~ ~~\| 1.0~~ \|- ~~\| Han~~ ~~\| [[Chinese character]], [[Kanji]], [[Hanja]], [[Hán tự]]~~ ~~\| Hani~~ ~~\| 75,738~~ ~~\| 1.0~~ \|- ~~\| Hangul~~ ~~\| [[Hangul]]~~ ~~\| Hang~~ ~~\| 11,737~~ ~~\| 1.0 (Hangul syllables relocated in 2.0)~~ \|- ~~\| Hanunoo~~ ~~\| [[Hanunó'o script]]~~ ~~\| Hano~~ ~~\| 21~~ ~~\| 3.2~~ \|- ~~\| Hebrew~~ ~~\| [[Hebrew alphabet]]~~ ~~\| Hebr~~ ~~\| 133~~ ~~\| 1.0~~ \|- ~~\| Hiragana~~ ~~\| [[Hiragana]]~~ ~~\| Hira~~ ~~\| 90~~ ~~\| 1.0~~ \|- ~~\| Imperial Aramaic~~ ~~\| [[Aramaic language#Imperial Aramaic\|Aramaic language]]~~ ~~\| Armi~~ ~~\| 31~~ ~~\| 5.2~~ \|- ~~\| Inscriptional Pahlavi~~ ~~\| [[Pahlavi scripts#Inscriptional Pahlavi\|Pahlavi scripts]]~~ ~~\| Phli~~ ~~\| 27~~ ~~\| 5.2~~ \|- ~~\| Inscriptional Parthian~~ ~~\| [[Parthian language#Written Parthian\|Parthian language]]~~ ~~\| Prti~~ ~~\| 30~~ ~~\| 5.2~~ \|- ~~\| Javanese~~ ~~\| [[Javanese script]]~~ ~~\| Java~~ ~~\| 91~~ ~~\| 5.2~~ \|- ~~\| Kaithi~~ ~~\| [[Kaithi]]~~ ~~\| Kthi~~ ~~\| 66~~ ~~\| 5.2~~ \|- ~~\| Kannada~~ ~~\| [[Kannada script]]~~ ~~\| Knda~~ ~~\| 84~~ ~~\| 1.0~~ \|- ~~\| Katakana~~ ~~\| [[Katakana]]~~ ~~\| Kana~~ ~~\| 299~~ ~~\| 1.0~~ \|- ~~\| Kayah Li~~ ~~\| [[Kayah Li script]]~~ ~~\| Kali~~ ~~\| 48~~ ~~\| 5.1~~ \|- ~~\| Kharoshthi~~ ~~\| [[Kharoṣṭhī]]~~ ~~\| Khar~~ ~~\| 65~~ ~~\| 4.1~~ \|- ~~\| Khmer~~ ~~\| [[Khmer script]]~~ ~~\| Khmr~~ ~~\| 146~~ ~~\| 3.0~~ \|- ~~\| Lao~~ ~~\| [[Lao script]]~~ ~~\| Laoo~~ ~~\| 65~~ ~~\| 1.0~~ \|- ~~\| Latin~~ ~~\| [[Latin alphabet]]~~ ~~\| Latn~~ ~~\| 1,244~~ ~~\| 1.0~~ \|- ~~\| Lepcha~~ ~~\| [[Lepcha script]]~~ ~~\| Lepc~~ ~~\| 74~~ ~~\| 5.1~~ \|- ~~\| Limbu~~ ~~\| [[Limbu script]]~~ ~~\| Limb~~ ~~\| 66~~ ~~\| 4.0~~ \|- ~~\| Linear B~~ ~~\| [[Linear B]]~~ ~~\| Linb~~ ~~\| 211~~ ~~\| 4.0~~ \|- ~~\| Lisu~~ ~~\| [[Fraser alphabet]]~~ ~~\| Lisu~~ ~~\| 48~~ ~~\| 5.2~~ \|- ~~\| Lycian~~ ~~\| [[Lycian script]]~~ ~~\| Lyci~~ ~~\| 29~~ ~~\| 5.1~~ \|- ~~\| Lydian~~ ~~\| [[Lydian script]]~~ ~~\| Lydi~~ ~~\| 27~~ ~~\| 5.1~~ \|- ~~\| Malayalam~~ ~~\| [[Malayalam script]]~~ ~~\| Mlym~~ ~~\| 95~~ ~~\| 1.0~~ \|- ~~\| Meetei Mayek~~ ~~\| [[Meitei Mayek script]]~~ ~~\| Mtei~~ ~~\| 56~~ ~~\| 5.2~~ \|- ~~\| Mongolian~~ ~~\| [[Mongolian script]], [[Clear script]], [[Manchu alphabet]]~~ ~~\| Mong~~ ~~\| 153~~ ~~\| 3.0~~ \|- ~~\| Myanmar~~ ~~\| [[Burmese script]]~~ ~~\| Mymr~~ ~~\| 188~~ ~~\| 3.0~~ \|- ~~\| N'Ko~~ ~~\| [[N'Ko]]~~ ~~\| Nkoo~~ ~~\| 59~~ ~~\| 5.0~~ \|- ~~\| New Tai Lue~~ ~~\| [[New Tai Lue]]~~ ~~\| Talu~~ ~~\| 83~~ ~~\| 4.1~~ \|- ~~\| Ogham~~ ~~\| [[Ogham]]~~ ~~\| Ogam~~ ~~\| 29~~ ~~\| 3.0~~ \|- ~~\| Ol Chiki~~ ~~\| [[Ol Chiki script]]~~ ~~\| Olck~~ ~~\| 48~~ ~~\| 5.1~~ \|- ~~\| Old Italic~~ ~~\| [[Old Italic alphabet]]~~ ~~\| Ital~~ ~~\| 35~~ ~~\| 3.1~~ \|- ~~\| Old Persian~~ ~~\| [[Old Persian cuneiform script]]~~ ~~\| Xpeo~~ ~~\| 50~~ ~~\| 4.1~~ \|- ~~\| Old South Arabian~~ ~~\| [[South Arabian alphabet]]~~ ~~\| Sarb~~ ~~\| 32~~ ~~\| 5.2~~ \|- ~~\| Old Turkic~~ ~~\| [[Old Turkic script]]~~ ~~\| Orkh~~ ~~\| 73~~ ~~\| 5.2~~ \|- ~~\| Oriya~~ ~~\| [[Oriya script]]~~ ~~\| Orya~~ ~~\| 84~~ ~~\| 1.0~~ \|- ~~\| Osmanya~~ ~~\| [[Osmanya script]]~~ ~~\| Osma~~ ~~\| 40~~ ~~\| 4.0~~ \|- ~~\| Phags-pa~~ ~~\| [['Phags-pa script]]~~ ~~\| Phag~~ ~~\| 56~~ ~~\| 5.0~~ \|- ~~\| Phoenician~~ ~~\| [[Phoenician alphabet]]~~ ~~\| Phnx~~ ~~\| 29~~ ~~\| 5.0~~ \|- ~~\| Rejang~~ ~~\| [[Rejang script]]~~ ~~\| Rjng~~ ~~\| 37~~ ~~\| 5.1~~ \|- ~~\| Runic~~ ~~\| [[Runic alphabet]]~~ ~~\| Runr~~ ~~\| 78~~ ~~\| 3.0~~ \|- ~~\| Samaritan~~ ~~\| [[Samaritan script]]~~ ~~\| Samr~~ ~~\| 61~~ ~~\| 5.2~~ \|- ~~\| Saurashtra~~ ~~\| [[Saurashtra script]]~~ ~~\| Saur~~ ~~\| 81~~ ~~\| 5.1~~ \|- ~~\| Shavian~~ ~~\| [[Shavian alphabet]]~~ ~~\| Shaw~~ ~~\| 48~~ ~~\| 4.0~~ \|- ~~\| Sinhala~~ ~~\| [[Sinhala script]]~~ ~~\| Sinh~~ ~~\| 80~~ ~~\| 3.0~~ \|- ~~\| Sundanese~~ ~~\| [[Sundanese script]]~~ ~~\| Sund~~ ~~\| 55~~ ~~\| 5.1~~ \|- ~~\| Syloti Nagri~~ ~~\| [[Sylheti Nagari]]~~ ~~\| Sylo~~ ~~\| 44~~ ~~\| 4.1~~ \|- ~~\| Syriac~~ ~~\| [[Syriac alphabet]]~~ ~~\| Syrc~~ ~~\| 77~~ ~~\| 3.0~~ \|- ~~\| Tagalog~~ ~~\| [[Baybayin]]~~ ~~\| Tglg~~ ~~\| 20~~ ~~\| 3.2~~ \|- ~~\| Tagbanwa~~ ~~\| [[Tagbanwa script]]~~ ~~\| Tagb~~ ~~\| 18~~ ~~\| 3.2~~ \|- ~~\| Tai Le~~ ~~\| [[Tai Nüa language#Writing_system\|Tai Nüa language]]~~ ~~\| Tale~~ ~~\| 35~~ ~~\| 4.0~~ \|- ~~\| Tai Tham~~ ~~\| [[Tai Tham script]]~~ ~~\| Lana~~ ~~\| 127~~ ~~\| 5.2~~ \|- ~~\| Tai Viet~~ ~~\| [[Tai Viet script]]~~ ~~\| Tavt~~ ~~\| 72~~ ~~\| 5.2~~ \|- ~~\| Tamil~~ ~~\| [[Tamil script]]~~ ~~\| Taml~~ ~~\| 72~~ ~~\| 1.0~~ \|- ~~\| Telugu~~ ~~\| [[Telugu script]]~~ ~~\| Telu~~ ~~\| 93~~ ~~\| 1.0~~ \|- ~~\| Thaana~~ ~~\| [[Tāna]]~~ ~~\| Thaa~~ ~~\| 50~~ ~~\| 3.0~~ \|- ~~\| Thai~~ ~~\| [[Thai alphabet]]~~ ~~\| Thai~~ ~~\| 86~~ ~~\| 1.0~~ \|- ~~\| Tibetan~~ ~~\| [[Tibetan script]]~~ ~~\| Tibt~~ ~~\| 201~~ ~~\| 1.0 (removed in 1.1 and reintroduced in 2.0)~~ \|- ~~\| Tifinagh~~ ~~\| [[Tifinagh]]~~ ~~\| Tfng~~ ~~\| 55~~ ~~\| 4.1~~ \|- ~~\| Ugaritic~~ ~~\| [[Ugaritic alphabet]]~~ ~~\| Ugar~~ ~~\| 31~~ ~~\| 4.0~~ \|- ~~\| Vai~~ ~~\| [[Vai syllabary]]~~ ~~\| Vaii~~ ~~\| 300~~ ~~\| 5.1~~ \|- ~~\| Yi~~ ~~\| [[Yi script]]~~ ~~\| Yiii~~ ~~\| 1,220~~ ~~\| 3.0~~ \|} == <span class="anchor" id="List of scripts in Unicode"></span> List of encoded scripts == ~~=== Common and inherited scripts ===~~ {{As of\|September 2024\|alt=As of version 16.0}}, Unicode defines 168 scripts (called "Alias" or "Property value alias") based on the ISO 15924 list. In addition, Unicode assigns the name "Common" to ISO 15924's {{code\|Zyyy}} code for undetermined scripts, "Inherited" to ISO 15924's {{code\|Zinh}} code for inherited scripts, and "Unknown" to ISO 15924's {{code\|Zzzz}} code for uncoded scripts. There are script codes defined by ISO 15924 but are not used in Unicode, including {{code\|Zsym}} (Symbols) and {{code\|Zmth}} (Mathematical notation). {{ISO 15924 script codes and related Unicode data\|state=uncollapsed}} == Missing scripts in Unicode == Unicode assigns every character in the [[UCS]] to a single script only. However, many characters — those that are not part of a formal natural language writing system or are unified across many writing systems (e.g. most symbols including music notation, currency signs, etc., as well as some numerals and many punctuation marks) — may be used in more than one script. In these cases Unicode defines them as belonging to the '''common''' script. The project Missing Scripts—with contributors from the [[Mainz University of Applied Sciences]], the L’Atelier national de recherche typographique (ANRT) in [[Nancy, France\|Nancy]], and the [[University of California, Berkeley]]—has compiled a list of 131 scripts that have not yet been encoded in ''The Unicode Standard'', out of a total of 294 recognized scripts according to the current state of research.<ref>{{Cite web \|title=The World's Writing Systems \|url=https://www.worldswritingsystems.org/ \|access-date=2024-10-04 \|website=www.worldswritingsystems.org}}</ref> In addition, many diacritics and non-spacing combining characters may be applied to characters from more than one script, and in these cases Unicode assigns them to the '''inherited''' script, which means that they have the same script class as the base character with which they combine, and so in different contexts they may be treated as belonging to different scripts. For example, U+0308 Combining Diaeresis may combine with either U+0065 Latin Small Letter E (ë) or U+0435 Cyrillic Small Letter IE (ё), and in the former case it inherits the Latin script of the preceding base character whereas in the latter case it inherits the Cyrillic script of the preceding base character. ~~== Character categories within scripts ==~~ ~~{{UCS_characters}}~~ Unicode provides a general category property for each character. So in addition to belonging to a script every character also has a general category. Typically scripts include letter characters including: uppercase letters, lowercase letter and modifier letters. Some characters are considered titlecase letters for a few [[Precomposed character\|precomposed]] ligatures such as ǲ (U+01F2). Such titlecase ligatures are all in the Latin and Greek scripts and are all compatibility characters and therefore Unicode discourages their use by authors. It is unlikely that new titlecase letters will be added in the future. Most writing systems do not differentiate between uppercase and lowercase letters. For those scripts all letters are categorized as “other letter” or “modifier letter”. Ideographs such as Unihan ideographs are also categorized as “other letters”. A few scripts do differentiate between uppercase and lowercase however: Latin, Cyrillic, Greek, Armenian, Georgian, and Deseret. Even for these scripts there are some letters that are nether uppercase nor lowercase. Scripts can also contain any other general category character such as '''marks''' (diacritic and otherwise), '''numbers''' (numerals), '''punctuation''', '''separators''' (word separators such as spaces), '''symbols''' and non-graphical '''format''' characters. These are included in a particular script when they are unique to that scripts. Other such characters are generally unified and included in the punctuation or diacritic blocks. However, the bulk of characters in any script (other than the common and inherited scripts) are letters. ==See also== [[~~Mapping~~Latin ofscript ~~Unicode~~in ~~characters~~Unicode]] [[Unicode ~~Symbols~~characters]] [[Unicode symbols]] [[Phonemic orthography\|Phonemic and phonetic orthography]] ==References== {{Reflist}} [http://www.unicode.org/versions/Unicode5.2.0/ The Unicode Standard 5.2] ==External links== [http://www.unicode.org/reports/tr24/ Unicode Standard Annex #24 : Unicode Script Property] * [https://sei.berkeley.edu/ Script Encoding Initiative], A project at UC Berkeley, USA, working to get more scripts included in the Unicode standard. * [https://www.worldswritingsystems.org The World’s Writing Systems], An overview of all 294 known writing systems, each with a typographic reference glyph and their Unicode status. {{Unicode navigation}} {{Writing systems}} [[Category:Unicode ~~Blocks~~\| Scripts]]