Content deleted Content added
FlaxenHobbit (talk | contribs) Fixed minor comma errors. Tags: Mobile edit Mobile web edit |
FlaxenHobbit (talk | contribs) →Definition and classification: Further fixes of minor comma, hyphenation, and word-usage errors. Tags: Mobile edit Mobile web edit |
||
Line 12:
== Definition and classification ==
When multiple languages make use of the same script, there are frequently some differences
=== Script versus writing system ===
"[[Writing system]]" is sometimes treated as a synonym for script. However, it also can be used as the specific concrete writing system supported by a script. For example, the Vietnamese writing system is supported by the Latin script. A writing system may also cover more than one script
Most writing systems can be broadly divided into several categories: '''logographic''', '''syllabic''', '''alphabetic''' (or '''segmental'''), '''abugida''', '''abjad''' and '''featural'''; however, all features of any of these may be found in any given writing system in varying proportions, often making it difficult to purely categorize a system. The term ''[[complex system]]'' is sometimes used to describe those where the admixture makes classification problematic.
Unicode supports all of these types of writing systems through its numerous scripts. Unicode also adds further properties to characters to help differentiate the various characters and the ways they behave within Unicode text
=== {{anchor|Common and inherited scripts}}{{anchor|Special script property values}}Special script property values ===
In addition to explicit or specific script properties, Unicode uses three special values:<ref name=Unicode_script_property>{{cite web|url=https://www.unicode.org/reports/tr24/|title=UAX #24: Unicode Script Property|website=www.unicode.org}}</ref>
;Common: Unicode can assign a character in the [[Universal Character Set|UCS]] to a single script only. However, many characters — those that are not part of a formal natural language writing system or are unified across many writing systems may be used in more than one script. For example, currency signs, symbols, numerals and punctuation marks. In these cases Unicode defines them as belonging to the "common" script ([[ISO 15924]] code "Zyyy").
;Inherited: Many diacritics and non-spacing combining characters may be applied to characters from more than one script. In these cases Unicode assigns them to the "inherited" script (ISO 15924 code Zinh), which means that they have the same script class as the base character with which they combine, and so in different contexts they may be treated as belonging to different scripts. For example, {{unichar|0308|Combining Diaeresis|cwith=}} may combine with either {{unichar|0065|Latin Small Letter E}} to create a Latin "ë", or with {{unichar|0435|Cyrillic Small Letter IE}} for the Cyrillic "ё". In the former case, it inherits the Latin script of the base character, whereas in the latter case, it inherits the Cyrillic script of the base character.
;Unknown: The value of "unknown" script (ISO 15924 code Zzzz) is given to unassigned, private
== Character categories within scripts ==
|