Script (Unicode): Difference between revisions

Content deleted Content added
Fixed minor comma errors.
Tags: Mobile edit Mobile web edit
Definition and classification: Further fixes of minor comma, hyphenation, and word-usage errors.
Tags: Mobile edit Mobile web edit
Line 12:
== Definition and classification ==
 
When multiple languages make use of the same script, there are frequently some differences:, particularly in diacritics and other marks. For example, Swedish and English both use the Latin script. However, [[Swedish alphabet|Swedish]] includes the character ‘å’''[[å]]'' (sometimes called a "Swedish ''O"''), while English has no such character. Nor does English make use of the diacritic ''[[combining circle above]]'' for any character. In general, the languages sharing the same scripts share many of the same characters. Despite these peripheral differences in the Swedish and English writing systems, they are said to use the same Latin script. SoThus, the Unicode abstraction of scripts is a basic organizing technique. The differences betweenamong different alphabets or writing systems remain and are supported through Unicode’s flexible scripts, combining marks and collation algorithms.
 
=== Script versus writing system ===
 
"[[Writing system]]" is sometimes treated as a synonym for script. However, it also can be used as the specific concrete writing system supported by a script. For example, the Vietnamese writing system is supported by the Latin script. A writing system may also cover more than one script,; for example, the Japanese writing system makes use of the [[Kanji|Han]], [[Hiragana]] and [[Katakana]] scripts.
 
Most writing systems can be broadly divided into several categories: '''logographic''', '''syllabic''', '''alphabetic''' (or '''segmental'''), '''abugida''', '''abjad''' and '''featural'''; however, all features of any of these may be found in any given writing system in varying proportions, often making it difficult to purely categorize a system. The term ''[[complex system]]'' is sometimes used to describe those where the admixture makes classification problematic.
 
Unicode supports all of these types of writing systems through its numerous scripts. Unicode also adds further properties to characters to help differentiate the various characters and the ways they behave within Unicode text -processing algorithms.
 
=== {{anchor|Common and inherited scripts}}{{anchor|Special script property values}}Special script property values ===
In addition to explicit or specific script properties, Unicode uses three special values:<ref name=Unicode_script_property>{{cite web|url=https://www.unicode.org/reports/tr24/|title=UAX #24: Unicode Script Property|website=www.unicode.org}}</ref>
;Common: Unicode can assign a character in the [[Universal Character Set|UCS]] to a single script only. However, many characters — those that are not part of a formal natural language writing system or are unified across many writing systems may be used in more than one script. For example, currency signs, symbols, numerals and punctuation marks. In these cases Unicode defines them as belonging to the "common" script ([[ISO 15924]] code "Zyyy").
;Inherited: Many diacritics and non-spacing combining characters may be applied to characters from more than one script. In these cases Unicode assigns them to the "inherited" script (ISO 15924 code Zinh), which means that they have the same script class as the base character with which they combine, and so in different contexts they may be treated as belonging to different scripts. For example, {{unichar|0308|Combining Diaeresis|cwith=}} may combine with either {{unichar|0065|Latin Small Letter E}} to create a Latin "ë", or with {{unichar|0435|Cyrillic Small Letter IE}} for the Cyrillic "ё". In the former case, it inherits the Latin script of the base character, whereas in the latter case, it inherits the Cyrillic script of the base character.
;Unknown: The value of "unknown" script (ISO 15924 code Zzzz) is given to unassigned, private -use, noncharacter, and surrogate code points.
 
== Character categories within scripts ==