Unicode block: Difference between revisions

Content deleted Content added
Update for Unicode 9.0
No edit summary
Line 1:
[[File:Unicode logo.svg|right]]
{{SpecialChars}}
[[File:Writing systems worldwide.png|460px|thumb
| {{Navbox with columns |child |style=font-size:90%;
| abovestyle=background:transparent;font-size:110%;padding:0;font-weight:bold;
| above = Index of predominant national and selected regional or minority scripts
| colheaderstyle=padding:0.15em 0.15em 0.25em;font-weight:normal; |colstyle=white-space:nowrap;
| col1header = [[Alphabet]]ic
| col1 = {{legend|#aaa|[[#Latin script|Latin]]}} {{legend|#008080|[[#Cyrillic|Cyrillic]]}} {{legend|blue|[[#Greek and Coptic|Greek]]}} {{legend|#1E90FF|[[#Armenian|Armenian]]}} {{legend|#00FFFF|[[#Georgian|Georgian]]}} {{legend|#008080|[[#Chinese, Japanese and Korean|Hangul]] {{sup|a}}}}
| col2header = {{longitem|[[Logogram|[L]ogographic]]<br/>and [[Syllabary|[S]yllabic]]}}
| col2 = {{legend|#8B0000|[[#Chinese, Japanese and Korean|Hanzi]] {{smaller|[L]}}}} {{legend|#FF0000|[[#Chinese, Japanese and Korean|Kana]] {{smaller|[S]}}{{\}}[[#Chinese, Japanese and Korean|Kanji]] {{smaller|[L]}}{{nbsp|2}}}} {{legend|#FF00FF|[[#Chinese, Japanese and Korean|Hanja]]{{sup|b}} {{smaller|[L]}}}}
| col3header = [[Abjad]]
| col3 = {{legend|green|[[#Semitic languages|Arabic]]}} {{legend|#7CFC00|[[#Semitic languages|Hebrew]]}}
| col4header = [[Abugida]]
| col4 = {{legend| #FFA500|[[#Brahmic (Indic) scripts|North Indic]]}} {{legend|#D2691E|[[#Brahmic (Indic) scripts|South Indic]]}} {{legend|#8B4513
|[[#Ethiopic|Ethiopic]]}} {{legend|#808000|[[#Thaana|Thaana]]}} {{legend|#EEE8AA|[[#Native American scripts|Canadian syllabic]]}}
| below = {{nowrap|{{sup|a}} [[Featural alphabet|Featural-alphabetic]].{{nbsp|3}}{{sup|b}} Limited.}}
}}
]]
 
==Character reference overview==
{{See also|List of XML and HTML character entity references|Unicode input}}
An [[HTML]] or [[XML]] ''numeric character reference'' refers to a character by its [[Universal Character Set]]/[[Unicode]] ''code point'', and uses the format
 
:<code>&#</code>''nnnn''<code>;</code>
or
:<code>&#x</code>''hhhh''<code>;</code>
 
where ''nnnn'' is the code point in [[decimal]] form, and ''hhhh'' is the code point in [[hexadecimal]] form. The ''x'' must be lowercase in XML documents. The ''nnnn'' or ''hhhh'' may be any number of digits and may include leading zeros. The ''hhhh'' may mix uppercase and lowercase, though uppercase is the usual style.
 
In contrast, a ''character entity reference'' refers to a character by the name of an ''[[SGML entity|entity]]'' which has the desired character as its ''replacement text''. The entity must either be predefined (built into the markup language) or explicitly declared in a [[Document Type Definition]] (DTD). The format is the same as for any entity reference:
 
:<code>&</code>''name''<code>;</code>
 
where ''name'' is the case-sensitive name of the entity. The semicolon is required.
 
==Unicode Blocks==
In '''[[Unicode]]''', a '''block''' is defined as one contiguous range of [[code point]]s. Blocks are named uniquely and have no [[intersection (set theory)|overlap]]. They have a starting code point of the form nnn0 and an ending code point of the form nnnF. A block explicitly can include code points that are [[General Category|unassigned and non-characters]].<ref>[http://www.unicode.org/glossary/#B Unicode glossary]</ref> Code points not belonging to any of the named blocks, e.g. in the unassigned [[Plane (Unicode)|planes]] 3–13, have the value block="No_block".