Content deleted Content added
m link fixed |
m clean up using AWB |
||
Line 3:
{{Unicode_scripts}}
In [[Unicode]], a '''script''' is an abstract coherent and unified [[writing system]] supporting one or more concrete writing systems which in turn support the written forms of one or more languages.{{Fact|date=May 2007}}
For example the [[
When multiple languages make use of the same script, there are frequently some differences: particularly in diacritics and other marks. For example, Swedish and English both use the Latin script. However, [[Swedish_alphabet|Swedish]] includes the character ‘å’ (sometimes called a “Swedish O”) while English has no such character. Nor does English make use of the diacritic combining circle above for any character. In general the languages sharing the same scripts share many of the same characters. Despite these peripheral differences in the Swedish and English writing systems they are said to use the same Latin script. So the Unicode abstraction of scripts is a basic organizing technique. The differences between different alphabets or writing systems remain and are supported through Unicode’s flexible scripts, combining marks and collation algorithms.
Line 9:
While all characters have the property of belonging to a script, many characters, such as symbols, indicate “common” or “inherited” for their script property. The unified diacritical characters and unified punctuation characters frequently have the “common” or “inherited” script property. However, the individual scripts often have their own punctuation and diacritics. So many scripts include not only letters, but also diacritic and other marks, punctuation, numerals and even their own idiosyncratic symbols and space characters.
Unicode already includes over 60 scripts supporting hundreds or even thousands of languages throughout the World. Unicode is actively working on many more as indicated by its [[Unicode#
== Writing system ==
Line 25 ⟶ 24:
| [[Syllabary|Syllabic]] || syllable || Japanese ''[[kana]]''
|-
| [[Alphabet
|-
| [[Abugida]] || phoneme (consonant+vowel) || Indian ''[[Devanāgarī]]''
Line 31 ⟶ 30:
| [[Abjad]] || phoneme (consonant) || [[Arabic alphabet]]
|-
| [[
|}
Line 40 ⟶ 39:
== Character categories within scripts ==
{{UCS_characters}}
Unicode provides a general category property for each character. So in addition to belonging to a script every character also has a general category. Typically scripts include letter characters including: uppercase letters, lowercase letter and modifier letters. Some characters are considered titlecase letters for a few [[
Most writing systems do not differentiate between uppercase and lowercase letters. For those scripts all letters are categorized as “other letter” or “modifier letter”. Ideographs such as Unihan ideographs are also categorized as “other letters”. A few scripts do differentiate between uppercase and lowercase however: Latin, Cyrillic, Greek, Armenian, Georgian, and Deseret. Even for these scripts there are some letters that are nether uppercase nor lowercase.
|