Variable-width encoding: Difference between revisions

Content deleted Content added
describe your changes
Tags: Visual edit Mobile edit Mobile web edit
Fix cite date error
Line 1:
{{about|the storage of text in computers|the transmission of data across noisy channels|variable-length code}}
{{referencedmore citations needed|date=December 2009}}
{{use dmy dates|date=January 2012}}
 
<ref>{{Cite journal|last=Crispin|first=M.|date=April 2005-04|title=UTF-9 and UTF-18 Efficient Transformation Formats of Unicode|url=http://dx.doi.org/10.17487/rfc4042}}</ref> A '''variable-width encoding''' is a type of [[character encoding]] scheme in which codes of differing lengths are used to encode a [[character set]] (a repertoire of symbols) for representation in a [[computer]]. Most common variable-width encodings are '''multibyte encodings''', which use varying numbers of [[byte]]s ([[octet (computing)|octets]]) to encode different characters.
(Some authors, notably in Microsoft documentation, use the term ''multibyte character set,'' which is a [[misnomer]], because representation size is an attribute of the encoding, not of the character set.)
 
Early variable width encodings using less than a byte per character were sometimes used to pack English text into fewer bytes in [[adventure game]]s for early [[microcomputers]]. However [[disk storage|disks]] (which unlike tapes allowed random access allowing text to be loaded on demand), increases in computer memory and general purpose [[compression algorithm]]s have rendered such tricks largely obsolete.
 
Multibyte encodings are usually the result of a need to increase the number of characters which can be encoded without breaking [[backward compatibility]] with an existing constraint. For example, with one byte (8 bits) per character, one can encode 256 possible characters; in order to encode more than 256 characters, the obvious choice would be to use two or more bytes per encoding unit, two bytes (16 bits) would allow 65,536 possible characters, but such a change would break compatibility with existing systems and therefore might not be feasible at all.
 
==General structure==
Line 29 ⟶ 30:
 
==See also==
* [[wchar_twchar t]] wide characters
* [[Lotus Multi-Byte Character Set]] (LMBCS)
* [[Triple-Byte Character Set]] (TBCS)
Line 37 ⟶ 38:
{{Character encoding}}
 
==References==
{{use dmy dates|date=January 2012}}
{{Reflist}}
 
{{DEFAULTSORT:Variable-Width Encoding}}
 
[[Category:Character encoding]]