Content deleted Content added
→Encoding standards: RFC 9285 is not a draft, fixed reference |
m Undid revision 1305036282 by Bender the Bot (talk) bot error fixed |
||
(45 intermediate revisions by 41 users not shown) | |||
Line 3:
{{original research|date=April 2010}}
{{more citations needed|date=December 2012}}
}}
{{anchor|ASCII armor}} A '''binary-to-text encoding''' is [[code|encoding]] of [[data (computing)|data]] in [[plain text]]. More precisely, it is an encoding of binary data in a sequence of [[character (computing)|printable characters]]. These encodings are necessary for transmission of data when the [[communication channel]] does not allow binary data (such as [[email]] or [[NNTP]]) or is not [[8-bit clean]]. [[Pretty Good Privacy|PGP]] documentation ({{IETF RFC|
==Overview==
Line 12 ⟶ 11:
==Description==
The [[ASCII]] text-encoding standard uses 7 bits to encode characters. With this it is possible to encode 128 (i.e. 2<sup>7</sup>) unique values (0–127) to represent the alphabetic, numeric, and punctuation characters commonly used in [[English language|English]], plus a selection of [[
In contrast, most computers store data in memory organized in eight-bit [[byte]]s. Files that contain machine-executable code and non-textual data typically contain all 256 possible eight-bit byte values. Many computer programs came to rely on this distinction between seven-bit ''text'' and eight-bit ''binary'' data, and would not function properly if non-ASCII characters appeared in data that was expected to include only ASCII text. For example, if the value of the eighth bit is not preserved, the program might interpret a byte value above 127 as a flag telling it to perform some function.
It is often desirable, however, to be able to send non-textual data through text-based systems, such as when one might attach an image file to an e-mail message. To accomplish this, the data is encoded in some way, such that eight-bit data is encoded into seven-bit ASCII characters (generally using only alphanumeric and punctuation characters—the
==Encoding plain text==
{{See also|Delimiter#ASCII armor|Return-to-libc attack#Protection from return-to-libc attacks}}Binary-to-text encoding methods are also used as a mechanism for encoding [[plain text]]. For example:
* Some systems have a more limited character set they can handle; not only are they not [[8-bit clean]], some cannot even handle every printable ASCII character.
* Other systems have limits on the number of characters that may appear between
* Still others add [[header (computing)|header]]s or [[trailer (information technology)|trailer]]s to the text.
* A few poorly-regarded but still-used protocols use [[in-band signaling]], causing confusion if specific patterns appear in the message. The best-known is the string "From " (including trailing space) at the beginning of a line, used to separate mail messages in the [[mbox]] file format.
By using a binary-to-text encoding on messages that are already plain text, then decoding on the other end, one can make such systems appear to be completely [[Transparency (telecommunication)|transparent]]. This is sometimes referred to as 'ASCII armoring'. For example, the ViewState component of [[ASP.NET]] uses [[base64]] encoding to safely transmit text via HTTP POST, in order to avoid [[delimiter collision]].
== Encoding standards ==
The table below compares the most used forms of binary-to-text encodings. The efficiency listed is the ratio between the number of bits in the input and the number of bits in the encoded output.
{| class="wikitable sortable"
Line 33 ⟶ 32:
! Encoding !! Data type !! Efficiency !! Programming language implementations !! Comments
|-
| [[Ascii85]] || Arbitrary || 80% || [http://sites.google.com/site/dannychouinard/Home/unix-linux-trinkets/little-utilities/base64-and-base85-encoding-awk-scripts awk] {{Webarchive|url=https://web.archive.org/web/20141229031706/http://sites.google.com/site/dannychouinard/Home/unix-linux-trinkets/little-utilities/base64-and-base85-encoding-awk-scripts |date=2014-12-29 }}, [http://www.ibiblio.org/pub/packages/ccic/software/unix/utils/btoa.c C], [https://github.com/woolstar/test/blob/master/encode/asc85.c C (2)], [https://web.archive.org/web/20131227071331/http://www.codinghorror.com/blog/2005/10/c-implementation-of-ascii85.html C#], [https://web.archive.org/web/20210927102719/http://blog.wezeku.com/2010/07/01/f-ascii85-module/ F#], [https://pkg.go.dev/encoding/ascii85 Go], [https://web.archive.org/web/20160304035222/http://java.freehep.org/freehep-io/apidocs/org/freehep/util/io/ASCII85.html Java] [https://metacpan.org/pod/Convert::Ascii85 Perl], [https://docs.python.org/3/library/base64.html#base64.a85encode Python], [https://web.archive.org/web/20151208205520/https://code.google.com/p/python-mom/source/browse/mom/codec/base85.py Python (2)]
|-
| [[Base32]] || Arbitrary || 62.5% || [
|-
| [[Base36]] || Integer || data-sort-value="64%"|~64% ||
|Uses the [[Arabic numerals]] 0–9 and the [[Latin alphabet|Latin letters]] A–Z (the [[ISO basic Latin alphabet]]). Commonly used by [[URL redirection]] systems like [[TinyURL]] or SnipURL/Snipr as compact alphanumeric identifiers.
|-
|
|-
|
|-
| {{anchor|Base58}}
[[File:Original source code bitcoin-version-0.1.0 file base58.h.png|400px|thumb|Base58 in the original bitcoin source code]]
|-
| [[Base62]] || Arbitrary || ~74% || [https://github.com/fbernier/base62 Rust], [https://pypi.org/project/pybase62/ Python]|| Similar to Base64, but contains only alphanumeric characters.
|-
| [[Base64]] || Arbitrary || 75% || [http://sites.google.com/site/dannychouinard/Home/unix-linux-trinkets/little-utilities/base64-and-base85-encoding-awk-scripts awk] {{Webarchive|url=https://web.archive.org/web/20141229031706/http://sites.google.com/site/dannychouinard/Home/unix-linux-trinkets/little-utilities/base64-and-base85-encoding-awk-scripts |date=2014-12-29 }}, [
|-
| [[Base85]]
| Revised version of [[Ascii85]]. |-
| Base91<ref>{{Cite web |author=Dake He |author2=Yu Sun |author3=Zhen Jia |author4=Xiuying Yu |author5=Wei Guo |author6=Wei He |author7=Chao Qi |author8=Xianhui Lu |title=A Proposal of Substitute for Base85/64 – Base91 |url=https://www.iiis.org/CDs2010/CD2010SCI/CCCT_2010/PapersPdf/TB100QM.pdf
|-
| basE91<ref>
|-
| Base94<ref>{{cite web |
|-
| Base122<ref>{{cite web |last=Albertson |first=Kevin |date=Nov 26, 2016 |title=Base-122 Encoding |url=http://blog.kevinalbs.com/base122
|-
| BaseXML<ref>{{cite web | url=https://github.com/kriswebdev/BaseXML | title=BaseXML - for XML1.0+ | website=[[GitHub]] | date=16 March 2019 }}</ref> || Arbitrary || 83.5% || [https://github.com/kriswebdev/BaseXML C Python JavaScript] || {{space}}
|-
| {{anchor|Bech32|Bech32m}}Bech32 || Arbitrary || data-sort-value="62.5%"|62.5% + at least 8 chars (label, separator, 6-char [[error correcting code|ECC]]) || C, C++, [[JavaScript]], [[Go (programming language)|Go]], Python, [[Haskell]], [[Ruby (programming language)|Ruby]], [[Rust (programming language)|Rust]]|| Specification.<ref>{{Cite web |date=8 December 2021 |title=bitcoin/bips |url=https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki#bech32
|-
| [[BinHex]] || Arbitrary || 75%|| [http://metacpan.org/module/Convert::BinHex Perl], [http://www.fpx.de/fp/Software/UUDeview/ C], [http://ibiblio.org/pub/linux/utils/compress/macutils.tar.gz C (2)] || MacOS Classic
Line 71:
| [[Hexadecimal#Base16 (transfer encoding)|Hexadecimal]] (Base16) || Arbitrary || 50% || Most languages || Exists in [[uppercase]] and [[Letter case#All lowercase|lowercase]] variants
|-
| [[Intel HEX]] || Arbitrary || data-sort-value="50%"|≲50% || [https://github.com/vsergeev/libGIS C library], [
|-
| [[MIME]] || Arbitrary || See [[Quoted-printable]] and [[Base64]] || See [[Quoted-printable]] and [[Base64]] || Encoding container for e-mail-like formatting
|-
| [[Percent
|-
| [[Quoted-printable]] || Text || data-sort-value="33%"|~33–100%{{efn|1= One byte stored as =XX. Encoding all but the 94 characters which don't need it (incl. space and tab).}} || Probably many || Preserves line breaks; cuts lines at 76 characters
|-
| [[S-record]] (Motorola hex) || Arbitrary || 49.6% || [https://github.com/vsergeev/libGIS C library], [
|-
| [[Tektronix hex]] || Arbitrary || || || Typically used to program [[EPROM]], [[Flash memory|NOR
|-
|[https://github.com/bchainhub/txms.js#readme TxMS]
| [[Uuencoding]] || Arbitrary || data-sort-value="60%"|~60% ([[Uuencoding#Disadvantages|up to 70%]]) || [[Uuencoding#Support in Perl|Perl]], [http://www.fpx.de/fp/Software/UUDeview/ C], [https://github.com/MHumm/DelphiEncryptionCompendium/blob/master/Source/DECFormat.pas Delphi], [https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/sun/misc/UUEncoder.java Java], [https://docs.python.org/3/library/uu.html Python], probably many others || Largely replaced by MIME and yEnc▼
|Arbitrary
|
|[https://github.com/bchainhub/txms.js TypeScript, CLI], [https://github.com/bchainhub/flutter_txms Dart]
|TxMS compresses binary data into a readable text format using Binary-to-Text encoding and allows reversible conversion back to hexadecimal.
|-
▲| [[Uuencoding]] || Arbitrary || data-sort-value="60%"|~60% ([[Uuencoding#Disadvantages|up to 70%]]) || [[Uuencoding#
|-
| [[Xxencoding]] || Arbitrary || data-sort-value="75%"|~75% (similar to Uuencoding) || [http://www.fpx.de/fp/Software/UUDeview/ C], [https://github.com/MHumm/DelphiEncryptionCompendium/blob/master/Source/DECFormat.pas Delphi] || Proposed (and occasionally used) as replacement for Uuencoding to avoid character set translation problems between ASCII and the EBCDIC systems that could corrupt Uuencoded data
|-
| z85 ([https://rfc.zeromq.org/spec/32/ ZeroMQ spec:32/Z85]) || Binary & ASCII || 80% (similar to Ascii85/Base85) || [https://github.com/zeromq/rfc/blob/master/src/spec_32.c C] (original), [https://github.com/coenm/Z85e C#], [https://pub.dev/packages/z85 Dart], [https://github.com/jamesruan/z85/blob/master/src/z85.erl Erlang], [https://github.com/tilinna/z85 Go], [https://github.com/philanc/plc/blob/master/plc/base85.lua Lua], [https://github.com/fxn/z85 Ruby], [https://docs.rs/z85/latest/src/z85/lib.rs.html Rust] and others
|-
| {{IETF RFC|1751}} ([[S/KEY]]) || Arbitrary || 33% || C,<ref name="RFC1760" /> [https://www.dlitz.net/software/pycrypto/doc/#crypto-util-rfc1751 Python]
|
"A Convention for [[Human-readable]] 128-bit Keys". A series of small English words is easier for humans to read, remember, and type in than decimal or other binary-to-text encoding systems.<ref>
Line 108 ⟶ 114:
Some of these encoding (quoted-printable and percent encoding) are based on a set of allowed characters and a single [[escape character]]. The allowed characters are left unchanged, while all other characters are converted into a string starting with the escape character. This kind of conversion allows the resulting text to be almost readable, in that letters and digits are part of the allowed characters, and are therefore left as they are in the encoded text. These encodings produce the shortest plain ASCII output for input that is mostly printable ASCII.
Some other encodings ([[base64]], [[uuencoding]]) are based on mapping all possible sequences of six [[bit]]s into different printable characters. Since there are more than 2<sup>6</sup> = 64 printable characters, this is possible. A given sequence of bytes is translated by viewing it as a stream of bits, breaking this stream in chunks of six bits and generating the sequence of corresponding characters. The different encodings differ in the mapping between sequences of bits and characters and in how the resulting text is formatted.
Some encodings (the original version of BinHex and the recommended encoding for [[CipherSaber]]) use four bits instead of six, mapping all possible sequences of 4 bits onto the 16 standard [[hexadecimal]] digits. Using 4 bits per encoded character leads to a 50% longer output than base64, but simplifies encoding and decoding—expanding each byte in the source independently to two encoded bytes is simpler than base64's expanding 3 source bytes to 4 encoded bytes.
Out of [[PETSCII]]'s first 192 codes, 164 have visible representations when quoted: 5 (white), 17–20 and 28–31 (colors and cursor controls), 32–90 (ascii equivalent), 91–127 (graphics), 129 (orange), 133–140 (function keys), 144–159 (colors and cursor controls), and 160–192 (graphics).<ref>
== See also ==
Line 120 ⟶ 126:
* [[Computer number format]]
* [[Geocode]]
* [[Numeral system]]s, [[List of numeral systems#By type of notations|listed by notation type]] <!-- This is here to help readers to find encodings that may not belong in this article (e.g. programmers or cryptographers looking for something such as [[Base 26]]), since the topic of this article is _not currently
Originally was to be a
{{see also|List of numeral systems#By type of notation}}
|