Unicode character property: Difference between revisions

Content deleted Content added
m {{anchor|Name}}Name and alias: Space: not the final frontier.
m switch to a proper ref using https instead of ftp
 
(13 intermediate revisions by 6 users not shown)
Line 23:
 
=={{anchor|Name}}Name and alias==
A Unicode character is assigned a unique ''Name'' (na).<ref name="Chapter4"/> The name is composed of uppercase letters A–Z, digits 0–9, [[hyphen-minus]] and [[spaceSpace (characterpunctuation)|space|space]]. Some sequences are excluded: names beginning with a space or hyphen, names ending with a space or hyphen, repeated spaces or hyphens, and space after hyphen are not allowed. The name is guaranteed to be unique within Unicode, and can be used to identify a code point and its character. Ideographic characters, of which there are tens of thousands, are named in the pattern "{{Smallcaps|{{lc:CJK UNIFIED IDEOGRAPH}}}}-''hhhh''". For example, {{unichar|4E00}}. Formatting characters also have names: {{unichar|00A0}}.
 
The following Unicode categories do not have a Name value assigned: Controls (General Category: Cc), Private use (Co), Surrogate (Cs), Non-characters (Cn) and Reserved (Cn). They may be referenced, informally, by a generic or specific meta-name, called "Code Point Labels": {{not a typo|<control>, <control-0088>, <reserved>, <noncharacter-''hhhh''>, <private-use-''hhhh''>, or <surrogate>}}. Since these labels contain "<" and ">", they can never appear in a Name, which prevents confusion.
Line 64:
 
===Casing===
The Case value is normative in Unicode. It pertains to those scripts with uppercase and the lowercase letters. Case-difference occurs in Adlam, Armenian, Cherokee, Coptic, Cyrillic, Deseret, Garay, Glagolitic, Greek, Khutsuri and Mkhedruli Georgian, Latin, Medefaidrin, Old Hungarian, Osage, Vithkuqi and Warang Citi scripts.
 
<!--(upper, lower, title, folding—both simple and full)-->
Line 76:
In Greek, the letter sigma has different lowercase forms depending on where it is in a word. {{Unichar|03a3}} converts to {{Unichar|03c3}} if it is at the start or middle of a word, and converts to {{Unichar|03c2}} if it is at the end of a word.
 
In Lithuanian, the dot in lowercase i and j is preserved when followed by accents. For example: Í in lowercase is i̇́.<ref>[http{{Cite web|url=https://ftpwww.unicode.org/Public/UNIDATAUCD/latest/ucd/SpecialCasing.txt]|title=Unicode Character Database: Special Casing Data|date=2024-05-10}}</ref>
 
Despite the existence of {{Unichar|1E9E}}, {{Unichar|00DF}} corresponds to "SS".