Script (Unicode): Difference between revisions

Content deleted Content added
Indexheavy (talk | contribs)
See also: added reference
SmackBot (talk | contribs)
m Date/fix the maintenance tags or gen fixes
Line 1:
{{mergeMerge|Mapping of Unicode characters|date=May 2007}}
{{Unreferenced|date=May 2007}}
{{unreferenced}}
{{Unicode_scripts}}
In [[Unicode]], a '''script''' is an abstract coherent and unified [[writing system]] supporting one or more concrete writing systems which in turn support the written forms of one or more languages.{{factFact|date=May 2007}}
For example the [[Latin_characters_in_Unicode|Latin]] script supports alphabets such as: [[English language|English]], [[French language|French]], [[Vietnamese language|Vietnamese]] and many others. Some scripts support one and only one writing system and language, for example: [[Armenian language|Armenian]]. Other scripts, like [[Latin_characters_in_Unicode|Latin]], support many different writing systems: [[English_alphabet|English]], [[French_alphabet|French]], [[German_alphabet|German]], [[Italian_alphabet|Italian]], and [[Latin_alphabet|Latin]] to name just some of the alphabets supported by the Latin script. Some languages also make use of multiple alternate writing systems. [[Turkish language|Turkish]], for example, used [[Ottoman_Turkish_alphabet|Arabic]] script before the 20th century and transitioned to Latin in the early part of the 20th century. For a list of languages supported by each script see the [[List_of_languages_by_writing_system|list of languages by writing system]].
 
Line 39:
 
== Character categories within scripts ==
{{Template:UCS_characters}}
Unicode provides a general category property for each character. So in addition to belonging to a script every character also has a general category. Typically scripts include letter characters including: uppercase letters, lowercase letter and modifier letters. Some characters are considered titlecase letters for a few [[Precomposed_character|precomposed]] ligatures such as DzDz (U+01F2). Such titlecase ligatures are all in the Latin and Greek scripts and are all compatibility characters and therefore Unicode discourages their use by authors. Its unlikely newt titlecase letters will be added in the future.
 
Most writing systems do not differentiate between uppercase and lowercase letters. For those scripts all letters are categorized as “other letter” or “modifier letter”. Ideographs such as Unihan ideographs are also categorized as “other letters”. A few scripts do differentiate between uppercase and lowercase however: Latin, Cyrillic, Greek, Armenian, Georgian, and Deseret. Even for these scripts there are some letters that are nether uppercase nor lowercase.