Unicode character property: Difference between revisions

Content deleted Content added
Tag: Reverted
m switch to a proper ref using https instead of ftp
 
(9 intermediate revisions by 6 users not shown)
Line 18:
*<code>decomposition</code> type or <mapping> = letter + diacritic, ligature X Y, superscript X, font X, initial X, medial X, final X, isolated X, vertical X, etc.
*<code>gc</code> = general category [letter, symbol, digit, punctuation, case behaviour, etc.]
*<code>nv</code> = numeric type and value [of a digit]. If numeric type is 'decimal', all 3 slots are filled. If 'digit', the first will be null. (This has <math></math>been discontinued.) If 'numeric', then the first two will be null and only the last will be used.
 
The property between <code>alias</code> and <code>upper case</code> is obsolete and is now null for all Unicode characters.
Line 64:
 
===Casing===
The Case value is normative in Unicode. It pertains to those scripts with uppercase and the lowercase letters. Case-difference occurs in Adlam, Armenian, Cherokee, Coptic, Cyrillic, Deseret, Garay, Glagolitic, Greek, Khutsuri and Mkhedruli Georgian, Latin, Medefaidrin, Old Hungarian, Osage, Vithkuqi and Warang Citi scripts.
 
<!--(upper, lower, title, folding—both simple and full)-->
Line 76:
In Greek, the letter sigma has different lowercase forms depending on where it is in a word. {{Unichar|03a3}} converts to {{Unichar|03c3}} if it is at the start or middle of a word, and converts to {{Unichar|03c2}} if it is at the end of a word.
 
In Lithuanian, the dot in lowercase i and j is preserved when followed by accents. For example: Í in lowercase is i̇́.<ref>[http{{Cite web|url=https://ftpwww.unicode.org/Public/UNIDATAUCD/latest/ucd/SpecialCasing.txt]|title=Unicode Character Database: Special Casing Data|date=2024-05-10}}</ref>
 
Despite the existence of {{Unichar|1E9E}}, {{Unichar|00DF}} corresponds to "SS".