Unicode character property: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 21:26, 2 January 2025 edit Symbol & Font Hunter (talk \| contribs) 390 edits expanded section →Casing: and other sections Tag: Visual edit ← Previous edit		Latest revision as of 22:28, 11 June 2025 edit undo Drmccreedy (talk \| contribs) Extended confirmed users, Template editors 26,287 edits m switch to a proper ref using https instead of ftp
(15 intermediate revisions by 7 users not shown)
Line 1: {{Short description\|Unicode code point property names and their uses}} {{Use British English\|date=January 2025}} The [[Unicode Standard]] assigns various properties to each Unicode character and [[code point]].<ref name="Chapter4">{{cite web\|url=https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-4/\|date=September 2024\|title=The Unicode Standard Version 16 \|publisher=The Unicode Consortium \|access-date=2024-09-13}}</ref><ref name="UAX44" /> Line 23: =={{anchor\|Name}}Name and alias== A Unicode character is assigned a unique ''Name'' (na).<ref name="Chapter4"/> The name is composed of uppercase letters A–Z, digits 0–9, [[hyphen-minus]] and [[Space (punctuation)\|space]]. Some sequences are excluded: names beginning with a space or hyphen, names ending with a space or hyphen, repeated spaces or hyphens, and space after hyphen are not allowed. The name is guaranteed to be unique within Unicode, and can be used to identify a code point and its character. Ideographic characters, of which there are tens of thousands, are named in the pattern "{{Smallcaps\|{{lc:CJK UNIFIED IDEOGRAPH}}}}-''hhhh''". For example, {{unichar\|4E00}}. Formatting characters also have names: {{unichar\|00A0}}. The following Unicode categories do not have a Name value assigned: Controls (General Category: Cc), Private use (Co), Surrogate (Cs), Non-characters (Cn) and Reserved (Cn). They may be referenced, informally, by a generic or specific meta-name, called "Code Point Labels": {{not a typo\|<control>, <control-0088>, <reserved>, <noncharacter-''hhhh''>, <private-use-''hhhh''>, or <surrogate>}}. Since these labels contain "<" and ">", they can never appear in a Name, which prevents confusion. Line 64: ===Casing=== The Case value is normative in Unicode. It pertains to those scripts with uppercase and ~~the~~ lowercase letters. Case-difference occurs in Adlam, Armenian, Cherokee, Coptic, Cyrillic, Deseret, Garay, Glagolitic, Greek, Khutsuri and Mkhedruli Georgian, Latin, Medefaidrin, Old Hungarian, Osage, Vithkuqi and Warang Citi scripts. <!--(upper, lower, title, folding—both simple and full)--> Line 76: In Greek, the letter sigma has different lowercase forms depending on where it is in a word. {{Unichar\|03a3}} converts to {{Unichar\|03c3}} if it is at the start or middle of a word, and converts to {{Unichar\|03c2}} if it is at the end of a word. In Lithuanian, the dot in lowercase i and j is preserved when followed by accents. For example: Í in lowercase is i̇́.<ref>~~[http~~{{Cite web\|url=https://~~ftp~~www.unicode.org/Public/~~UNIDATA~~UCD/latest/ucd/SpecialCasing.txt]\|title=Unicode Character Database: Special Casing Data\|date=2024-05-10}}</ref> Despite the existence of {{Unichar\|1E9E}}, {{Unichar\|00DF}} corresponds to "SS".