Unicode character property: Difference between revisions

Content deleted Content added
Bidirectional writing: Corrected markup per MOS:BOLD and MOS:WAW, other tweaks
Numeric values and types: Corrected markup per MOS:BOLD and MOS:WAW
Line 127:
 
===Decimal===
Characters are classified with a '''Numeric type'''.<ref name="Chapter4"/> Characters such as fractions, subscripts, superscripts, Roman numerals, currency numerators, encircled numbers, and script-specific digits are type Numeric. They have a '''numeric value''' that can be decimal, including zero and negatives, or a vulgar fraction. If there is not such a value, as with most of the characters, the numeric type is "None".
 
The characters that do have a numeric value are separated in three groups: Decimal (De), Digit (Di) and Numeric (Nu, i.e. all other). "Decimal" means the character is a straight decimal digit. Only characters that are part of a contiguous encoded range 0..9 have numeric type Decimal. Other digits, like superscripts, have numeric type Digit. All numeric characters like fractions and Roman numerals end up with the type "Numeric". The intended effect is that a simple parser can use these decimal numeric values, without being distracted by say a numeric superscript or a fraction. Eighty-three CJK Ideographs that represent a number, including those used for accounting, are typed Numeric.
 
On the other hand, characters that could have a numeric value as a second meaning are still marked Numeric type "''None"'', and have no numeric value (""). E.g. Latin letters can be used in paragraph numbering like "II.A.1.b", but the letters "I", "A" and "b" are not numeric (type ''None'') and have no numeric value.
"None") and have no numeric value.
{{Numeric Type (Unicode)}}
 
===Hexadecimal digits===
[[Hexadecimal]] characters are those in the series with hexadecimal values 0...9ABCDEF (sixteen characters, decimal value 0–15). The character property '''Hex_Digit''' is set to Yes when a character is in such a series:
 
{{Hexadecimal digit (Unicode)}}
 
Forty-four characters are marked as ''Hex_Digit''. The ones in the Basic Latin block are also marked as '''ASCII_Hex_Digit'''.
 
Unicode has no separate characters for hexadecimal values. A consequence is, that when using regular characters it is not possible to determine whether hexadecimal value is intended, or even whether a value is intended at all. That should be determined at a higher level, e.g. by prepending "''0x"'' to a hexadecimal number or by context. The only feature is that Unicode can note that a sequence ''can or can not'' be a hexadecimal value.
 
==Block==