Content deleted Content added
expanded section →Casing: and other sections |
Drmccreedy (talk | contribs) m switch to a proper ref using https instead of ftp |
||
(15 intermediate revisions by 7 users not shown) | |||
Line 1:
{{Short description|Unicode code point property names and their uses}}
{{Use British English|date=January 2025}}
The [[Unicode Standard]] assigns various properties to each Unicode character and [[code point]].<ref name="Chapter4">{{cite web|url=https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-4/|date=September 2024|title=The Unicode Standard Version 16 |publisher=The Unicode Consortium |access-date=2024-09-13}}</ref><ref name="UAX44" />
Line 23:
=={{anchor|Name}}Name and alias==
A Unicode character is assigned a unique ''Name'' (na).<ref name="Chapter4"/> The name is composed of uppercase letters A–Z, digits 0–9, [[hyphen-minus]] and [[Space (punctuation)|space]]. Some sequences are excluded: names beginning with a space or hyphen, names ending with a space or hyphen, repeated spaces or hyphens, and space after hyphen are not allowed. The name is guaranteed to be unique within Unicode, and can be used to identify a code point and its character. Ideographic characters, of which there are tens of thousands, are named in the pattern "{{Smallcaps|{{lc:CJK UNIFIED IDEOGRAPH}}}}-''hhhh''". For example, {{unichar|4E00}}. Formatting characters also have names: {{unichar|00A0}}.
The following Unicode categories do not have a Name value assigned: Controls (General Category: Cc), Private use (Co), Surrogate (Cs), Non-characters (Cn) and Reserved (Cn). They may be referenced, informally, by a generic or specific meta-name, called "Code Point Labels": {{not a typo|<control>, <control-0088>, <reserved>, <noncharacter-''hhhh''>, <private-use-''hhhh''>, or <surrogate>}}. Since these labels contain "<" and ">", they can never appear in a Name, which prevents confusion.
Line 64:
===Casing===
The Case value is normative in Unicode. It pertains to those scripts with uppercase and
<!--(upper, lower, title, folding—both simple and full)-->
Line 76:
In Greek, the letter sigma has different lowercase forms depending on where it is in a word. {{Unichar|03a3}} converts to {{Unichar|03c3}} if it is at the start or middle of a word, and converts to {{Unichar|03c2}} if it is at the end of a word.
In Lithuanian, the dot in lowercase i and j is preserved when followed by accents. For example: Í in lowercase is i̇́.<ref>
Despite the existence of {{Unichar|1E9E}}, {{Unichar|00DF}} corresponds to "SS".
|