Script (Unicode): Difference between revisions

Content deleted Content added
Indexheavy (talk | contribs)
clarification on writing system /script and some spelling corrections
Indexheavy (talk | contribs)
Line 38:
== Character categories within scripts ==
{{Template:UCS_characters}}
Unicode provides a general category property for each character. So in addition to belonging to a script every character also has a general category. Typically scripts include letter characters including: uppercase letters, lowercase letter, and modifier letters. andSome characters are considered titlecase letters for a few [[Precomposed_character|precomposed]] ligatures such as Dz (U+01F2). Such titlecase ligatures are all in the Latin and Greek scripts and are all compatibility characters and therefore Unicode discourages their use by authors. Its unlikely newt titlecase letters will be added in the future.
 
Most writing systems do not differentiate between uppercase and lowercase letters. For those scripts all letters are categorized as “other letter” or “modifier letter”. Ideographs such as Unihan ideographs are also categorized as “other letters”. A few scripts do differentiate between uppercase and lowercase however: Latin, Cyrillic, Greek, Armenian, Georgian, and Deseret. Even for these scripts there are some letters that are nether uppercase nor lowercase.
However, scripts can also contain any other general category character when its meant primarily to support the particular script. So scripts contain numbers (numerals), diacritics and other marks, punctuation, symbols and even control formatting characters in some cases.
 
Scripts can also contain any other general category character such as '''numbers''' (numerals), '''punctuation''', '''marks''' (diacritic and otherwise), '''separators''',, '''symbols''' and non-graphical '''format''' characters. These are included in a particular script when they are unique to that scripts. Other such characters are generally unified and included in the punctuation or diacritic blocks. However, the bulk of characters in any script are letters.
 
[[Category:Unicode]]