Binary-to-text encoding: Difference between revisions

Content deleted Content added
Sakkath (talk | contribs)
s/organised/organized (en_US not en_UK afaik)
No edit summary
Line 1:
A '''binary -to -text encoding''' is [[Character encoding|encoding]] of data in [[plain text]]. More precisely, it is an encoding of [[binary data]] in a sequence of [[ASCII]] -printable characters. These encodings are necessary for transmission of data when the channel or the protocol only allows ASCII -printable characters, such as [[e-mail]] or [[usenet]]. [[PGP]] documentation uses the term ''[[ASCII Armorarmor]]'' for binary -to -text encoding when referring to [[Radix-64]].
 
==Description==
The ASCII text-encoding standard uses 128 unique values (0–127) to represent the alphabetic, numeric, and punctuation characters commonly used in the [[English language|English]], plus a selection of 'control codes' which do not represent printable characters. For example, the capital letter ''A'' is ASCII character 65, the numeral ''2'' is ASCII 50, the character ''}'' is ASCII 125, and the [[metacharacter ]]''carriage return'' is ASCII 13. Systems based on ASCII use seven bits to represent these values digitally.
 
By contrast, most computers store data in memory organized in eight-bit [[byte]]s, and, in the case of machine-executable code and non-textual data formats where maximum storage density is desirable, use the full range of 256 possible values in each eight-bit byte. Many computer programs came to rely on this distinction between seven-bit ''text'' and eight-bit ''binary'' data, and would not function properly if non-ASCII characters appeared in data that was expected to include only ASCII text. For example, the value of the eighth bit might not be preserved, or the program might interpret a byte value above 127 as a flag telling it to perform some function.
Line 9:
 
==Encoding plain text==
Although this encoding method is useful for transmitting non-textual data through text-based systems, it is also used as a mechanism for encoding [[plain text]]. This is done in situations where certain plain text characters may interfere with storage or transmission requirements. This is sometimes referred to as '[[ASCII armoringarmor]]ing'.
 
Examples:
Line 15:
 
==Encoding standards==
The most used forms of binary -to -text encodings are:
 
* [[base64]]
Line 32:
Some of these encoding (quoted-printable and percent encoding) are based on a set of allowed characters and a single [[escape character]]. The allowed characters are left unchanged, while all other characters are converted into a string starting with the escape character. This kind of conversion allows the resulting text to be almost readable, in that letters and digits are part of the allowed characters, and are therefore left as they are in the encoded text.
 
Some other encodings (base64, uuencoding) are based on mapping all possible sequences of six [[bit]]s into different printable characters. Since there are more than 2<mathsup>2^6=64</mathsup>&nbsp;=&nbsp;64 printable characters, this is possible. A given sequence of bytes is translated by viewing it as stream of bits, breaking this stream in chunks of six bits and generating the sequence of corresponding characters. The different encodings differ in the mapping between sequences of bits and characters and in how the resulting text is formatted. Some encodings (the original version of BinHex) use four bits instead of six. This leads to a 50% longer output but simplifies the procedure of encoding, as the byte boundaries in the source data and the character boundaries in the output only line up every second output character instead of every fourth.
 
[[Category:Binary to text encoding formats|*]]