Content deleted Content added
→Unicode variable-width encodings: correct grave inaccuracy concerning UTF-8 |
No edit summary |
||
Line 3:
(Some authors, notably in Microsoft documentation, use the term ''multibyte character set,'' which is a [[misnomer]] since representation size is an attribute of the encoding, not of the character set.)
Early variable width encodings using less than a byte per character were sometimes used to pack English text into
Multibyte encodings are usually the result of a need to increase the number of characters which can be encoded without breaking [[backward compatibility]] with an existing constraint. For example, with one byte (8 bits) per character, one can encode 256 possible characters; in order to encode more than 256 characters, the obvious choice would be to use two or more bytes per encoding unit, two bytes (16 bits) would allow 65,536 possible characters, but such a change would break compatibility with existing systems and therefore might not be feasible at all.
|