Content deleted Content added
R. S. Shaw (talk | contribs) all Unicodings are multbyte, but only some are variable-width. |
→[[Unicode]] variable-width encodings: remove double reference to utf-32 and some other edits |
||
Line 20:
On the PC ([[MS-DOS]] and [[Microsoft Windows]] platforms), two encodings became established for Japanese and Traditional Chinese in which all of singletons, lead units and trail units overlapped: [[Shift-JIS]] and [[Big5]] respectively. In Shift-JIS, lead units had the range 81-9F and E0-FC, trail units had the range 40-7E and 80-FC, and singletons had the range 21-7E and A1-DF. In Big5, lead units had the range A1-FE, trail units had the range 40-7E and A1-FE, and singletons had the range 21-7E.
==
The [[Unicode]] standard has two variable-width encodings: [[UTF-8]] and [[UTF-16]]
UTF-16 was devised to break free of the 65,536-character limit of the original Unicode (1.x) without breaking compatibility with the 16-bit encoding. In UTF-16, singletons have the range 0000-D7FF and E000-FFFF, lead units the range D800-DBFF and trail units the range DC00-DFFF. The lead and trail units, called in Unicode terminology high surrogates and low surrogates respectively, map 1024×1024 or 1,048,576 numbers, making for a maximum of possible 1,114,112 codepoints in Unicode.
[[Category:Character encoding]]
|