Binary-to-text encoding: Difference between revisions

Content deleted Content added
clarify whether "output character" refers to encoded or decoded character; add original research :-)
+RFC 2440; +hexadecimal
Line 1:
A '''binary-to-text encoding''' is [[Character encoding|encoding]] of data in [[plain text]]. More precisely, it is an encoding of [[binary data]] in a sequence of [[ASCII]]-printable characters. These encodings are necessary for transmission of data when the channel or the protocol only allows ASCII-printable characters, such as [[e-mail]] or [[usenet]]. [[Pretty_Good_Privacy|PGP]] documentation ( RFC 2440 ) uses the term '''ASCII armor''' for binary-to-text encoding when referring to [[Radix-64]].
 
==Description==
Line 21:
The most used forms of binary-to-text encodings are:
 
* [[hexadecimal]]
* [[base64]]
* [[quoted-printable]]
Line 39 ⟶ 40:
Some other encodings ([[base64]], [[uuencoding]]) are based on mapping all possible sequences of six [[bit]]s into different printable characters. Since there are more than 2<sup>6</sup>&nbsp;=&nbsp;64 printable characters, this is possible. A given sequence of bytes is translated by viewing it as stream of bits, breaking this stream in chunks of six bits and generating the sequence of corresponding characters. The different encodings differ in the mapping between sequences of bits and characters and in how the resulting text is formatted.
 
Some encodings (the original version of BinHex and the recommended encoding for [[CipherSaber]]) use four bits instead of six, mapping all possible sequences of 4 bits onto the 16 standard [[hexadecimal]] digits.
Using 4 bits per encoded character leads to a 50% longer output than base64, but simplifies encoding and decoding -- expanding each byte in the source independently to two encoded bytes is simpler than base64's expanding 3 source bytes to 4 encoded bytes.