Content deleted Content added
m →[[CJK]] multibyte encodings: - copy-editting |
m →Unicode variable-width encodings: copy-edit |
||
Line 22:
==Unicode variable-width encodings==
The [[Unicode]] standard has two variable-width encodings: [[UTF-8]] and [[UTF-16]] (it also has a fixed-width encoding, [[UTF-32]]). Originally, both Unicode and [[ISO 10646|ISO 10646]] standards were meant to be fixed-width, with
UTF-16 was devised to break free of the 65,536-character limit of the original Unicode (1.x) without breaking compatibility with the 16-bit encoding. In UTF-16, singletons have the range 0000-D7FF and E000-FFFF, lead units the range D800-DBFF and trail units the range DC00-DFFF. The lead and trail units, called in Unicode terminology high surrogates and low surrogates respectively, map 1024×1024 or 1,048,576 numbers, making for a maximum of possible 1,114,112 codepoints in Unicode.
|