Binary-to-text encoding: Difference between revisions

Content deleted Content added
m Include hexadecimal values for characters and add subscript 8 for those present in octal
m Undid revision 1305036282 by Bender the Bot (talk) bot error fixed
 
(40 intermediate revisions by 37 users not shown)
Line 3:
{{original research|date=April 2010}}
{{more citations needed|date=December 2012}}
{{Cleanup bare URLs|date=September 2022}}
 
}}
{{anchor|ASCII armor}} A '''binary-to-text encoding''' is [[code|encoding]] of [[data (computing)|data]] in [[plain text]]. More precisely, it is an encoding of binary data in a sequence of [[character (computing)|printable characters]]. These encodings are necessary for transmission of data when the [[communication channel]] does not allow binary data (such as [[email]] or [[NNTP]]) or is not [[8-bit clean]]. [[Pretty Good Privacy|PGP]] documentation ({{IETF RFC|48809580}}) uses the term "'''ASCII armor'''" for binary-to-text encoding when referring to [[Base64]].
 
==Overview==
Line 12 ⟶ 11:
 
==Description==
The [[ASCII]] text-encoding standard uses 7 bits to encode characters. With this it is possible to encode 128 (i.e. 2<sup>7</sup>) unique values (0–127) to represent the alphabetic, numeric, and punctuation characters commonly used in [[English language|English]], plus a selection of [[C0Control and C1 control codescharacter|controlControl codescharacters]] which do not represent printable characters. For example, the capital letter '''A''' is ASCIIrepresented characterin 7 bits as 100 0001<sub>2</sub>, 0x41 (65101<sub>8</sub>) , the numeral '''2''' is ASCII011 0010<sub>2</sub> 0x32 (5062<sub>8</sub>), the character '''<nowiki>}</nowiki>''' is ASCII111 1101<sub>2</sub> 0x7D (125175<sub>8</sub>), and the [[metacharacterControl character]] ''carriage return'RETURN''' is ASCII000 1101<sub>2</sub> 0x0D (1315<sub>8</sub>). Systems based on ASCII use seven bits to represent these values digitally.
 
In contrast, most computers store data in memory organized in eight-bit [[byte]]s. Files that contain machine-executable code and non-textual data typically contain all 256 possible eight-bit byte values. Many computer programs came to rely on this distinction between seven-bit ''text'' and eight-bit ''binary'' data, and would not function properly if non-ASCII characters appeared in data that was expected to include only ASCII text. For example, if the value of the eighth bit is not preserved, the program might interpret a byte value above 127 as a flag telling it to perform some function.
Line 33 ⟶ 32:
! Encoding !! Data type !! Efficiency !! Programming language implementations !! Comments
|-
| [[Ascii85]] || Arbitrary || 80% || [http://sites.google.com/site/dannychouinard/Home/unix-linux-trinkets/little-utilities/base64-and-base85-encoding-awk-scripts awk] {{Webarchive|url=https://web.archive.org/web/20141229031706/http://sites.google.com/site/dannychouinard/Home/unix-linux-trinkets/little-utilities/base64-and-base85-encoding-awk-scripts |date=2014-12-29 }}, [http://www.ibiblio.org/pub/packages/ccic/software/unix/utils/btoa.c C], [https://github.com/woolstar/test/blob/master/encode/asc85.c C (2)], [https://web.archive.org/web/20131227071331/http://www.codinghorror.com/blog/2005/10/c-implementation-of-ascii85.html C#], [https://web.archive.org/web/20210927102719/http://blog.wezeku.com/2010/07/01/f-ascii85-module/ F#], [https://pkg.go.dev/encoding/ascii85 Go], [https://web.archive.org/web/20160304035222/http://java.freehep.org/freehep-io/apidocs/org/freehep/util/io/ASCII85.html Java] [https://metacpan.org/pod/Convert::Ascii85 Perl], [https://docs.python.org/3/library/base64.html#base64.a85encode Python], [https://web.archive.org/web/20151208205520/https://code.google.com/p/python-mom/source/browse/mom/codec/base85.py Python (2)]|| There exist several variants of this encoding, [[Base85]], [[btoa]], etc.
|-
| [[Base32]] || Arbitrary || 62.5% || [httphttps://sourceforge.net/projects/cyoencode/ ANSI C], [https://github.com/MHumm/DelphiEncryptionCompendium/blob/master/Source/DECFormat.pas Delphi], [https://pkg.go.dev/encoding/base32 Go], [http://commons.apache.org/codec/ Java], [https://github.com/zanaptak/BinaryToTextEncoding C# F#], [https://docs.python.org/dev/library/base64.html#base64.b32encode Python] || {{space}}
|-
| [[Base36]] || Integer || data-sort-value="64%"|~64% || bash, [[C (programming language)|C]], [[C++]], [[C Sharp (programming language)|C#]], [[Java (programming language)|Java]], [[Perl]], [[PHP]], [[Python (programming language)|Python]], Visual Basic, [[Swift (programming language)|Swift]], many others
|Uses the [[Arabic numerals]] 0–9 and the [[Latin alphabet|Latin letters]] A–Z (the [[ISO basic Latin alphabet]]). Commonly used by [[URL redirection]] systems like [[TinyURL]] or SnipURL/Snipr as compact alphanumeric identifiers.
|-
| [[Base45]] || Arbitrary || ~67% (97%{{efn|Encoding for QR code generation automatically selects the encoding to match the input character set, encoding 2 alphanumeric characters in 11 bits, and Base45 encodes 16 bits into 3 such characters. The efficiency is thus 32 bits of binary data encoded in 33 bits: 97%.}}) || [https://github.com/Dasio/base45/ Go], [https://pypi.org/project/base45/ Python] || Defined in IETF Specification RFC 9285 for including binary data compactly in a [[QR code]].<ref>{{Cite web|url=https://rfc-editor.org/rfc/rfc9285|title = The Base45 Data Encoding|date = 2022-08-11|last1 = Fältström|first1 = Patrik|last2 = Ljunggren|first2 = Freik|last3 = Gulik|first3 = Dirk-Willem van|quote=Even in Byte mode, a typical QR code reader tries to interpret a byte sequence as text encoded in UTF-8 or ISO/IEC 8859-1. ... Such data has to be converted into an appropriate text before that text could be encoded as a QR code. ... Base45 ... offers a more compact QR code encoding.}}</ref>
|-
| [[Base56]] || Integer || — || [http://rossduggan.ie/blog/codetry/base-56-integer-encoding-in-php/index.html PHP], [https://github.com/jyn514foss-fund/base56 Python], [https://pkg.go.dev/toolman.org/encoding/base56 Go] || A variant of Base58 encoding which further sheds the lowercase 'i1' and the lowercase 'o' characters in order to minimise the risk of fraud and human-error.<ref>{{cite web |last=Duggan |first=Ross |date=August 18, 2009 |title=Base-56 Integer Encoding in PHP |url=http://rossduggan.ie/blog/codetry/base-56-integer-encoding-in-php/index.html}}</ref>
|-
| {{anchor|Base58}}Base58 || Integer || data-sort-value="73%"|~73% || [https://github.com/bitcoin/libbase58 C], [https://github.com/bitcoin/bitcoin/blob/master/src/base58.h C++], [https://pypi.python.org/pypi/base58 Python], [https://github.com/medo64/Medo/blob/main/src/Medo/Convert/Base58.cs C#], [https://github.com/NovaCrypto/Base58 Java] || Similar to Base64, but modified to avoid both non-alphanumeric characters (+ and /) and letters that might look ambiguous when printed (0{{snd}} zero, I{{snd}} capital i, O{{snd}} capital o and l{{snd}} lower-case L). Base58 is used to represent [[bitcoin]] addresses.<ref>{{cite web cn|title=Protocol documentation |url=https://en.bitcoin.it/wiki/Protocol_documentation#Addresses |website=Bitcoin Wiki |access-date=10April July 20212023}}</ref> Some messaging and social media systems [[Line wrap and word wrap|break lines]] on non-alphanumeric strings. This is avoided by not using [[Percent-encoding#Types of URIReserved characters|URI reserved characters]] such as +. For [[SegWit]], it was replaced by Bech32, see below.
[[File:Original source code bitcoin-version-0.1.0 file base58.h.png|400px|thumb|Base58 in the original bitcoin source code]]
|-
| [[Base62]] || Arbitrary || ~74% || [https://github.com/fbernier/base62 Rust], [https://pypi.org/project/pybase62/ Python]|| Similar to Base64, but contains only alphanumeric characters.
|-
| [[Base64]] || Arbitrary || 75% || [http://sites.google.com/site/dannychouinard/Home/unix-linux-trinkets/little-utilities/base64-and-base85-encoding-awk-scripts awk] {{Webarchive|url=https://web.archive.org/web/20141229031706/http://sites.google.com/site/dannychouinard/Home/unix-linux-trinkets/little-utilities/base64-and-base85-encoding-awk-scripts |date=2014-12-29 }}, [httphttps://base64.sourceforge.net/ C], [http://www.fpx.de/fp/Software/UUDeview/ C (2)], [https://github.com/MHumm/DelphiEncryptionCompendium/blob/master/Source/DECFormat.pas Delphi], [https://pkg.go.dev/encoding/base64 Go], [https://docs.python.org/3/library/base64.html#base64.b64encode Python], many others || An early and still-popular encoding, first specified as part of {{spaceIETF RFC|989}} in 1987
|-
| [[Base85]] ({{IETF RFC|1924}}) || Arbitrary || 80% ||[https://github.com/woolstar/test/blob/master/encode/base85.c C], [https://docs.python.org/3/library/base64.html#base64.b85encode Python], [https://code.google.com/p/python-mom/source/browse/mom/codec/base85.py Python (2)]
| Revised version of [[Ascii85]].
|-
| Base91<ref>{{Cite web |authorsauthor=Dake He, |author2=Yu Sun, |author3=Zhen Jia, |author4=Xiuying Yu, |author5=Wei Guo, |author6=Wei He, |author7=Chao Qi, |author8=Xianhui Lu |title=A Proposal of Substitute for Base85/64 – Base91 |url=https://www.iiis.org/CDs2010/CD2010SCI/CCCT_2010/PapersPdf/TB100QM.pdf |website=International Institute of Informatics and Systemics}}</ref>|| Arbitrary || 81% || [https://github.com/zanaptak/BinaryToTextEncoding C# F#] || Constant width variant
|-
| basE91<ref>{{Cite web |title=binary to ASCII text encoding |url=https://base91.sourceforge.net/ |access-date=2023-03-20 |website=basE91 |publisher=[[SourceForge]]}}</ref>|| Arbitrary || 81% || [https://sourceforge.net/projects/base91/ C, Java, PHP, 8086 Assembly, AWK] [https://github.com/zanaptak/BinaryToTextEncoding C#, F#], [https://crates.io/crates/base91 Rust] || Variable width variant
|-
| Base94<ref>{{cite web |date=April 18, 2020 |title=Convert binary data to a text with the lowest overhead |url=https://vorakl.com/articles/base94/ |website=Vorakl's notes}}</ref>|| Arbitrary || 82% || [https://github.com/vorakl/base94 Python], [https://gist.github.com/iso2022jp/4054241 C], [https://crates.io/crates/base94 Rust] || {{space}}
|-
| Base122<ref>{{cite web |last=Albertson |first=Kevin |date=Nov 26, 2016 |title=Base-122 Encoding |url=http://blog.kevinalbs.com/base122}}</ref>|| Arbitrary || 87.5% || [https://github.com/kevinAlbs/Base122 JavaScript], [https://github.com/Theelx/pybase122 Python], [https://github.com/patrickfav/base122-java Java], [https://github.com/eyaler/ztml Base125 Python and Javascript], [https://github.com/vence722/base122-go Go], [https://github.com/kevinAlbs/libbase122 C]|| {{space}}
Line 64 ⟶ 63:
| BaseXML<ref>{{cite web | url=https://github.com/kriswebdev/BaseXML | title=BaseXML - for XML1.0+ | website=[[GitHub]] | date=16 March 2019 }}</ref> || Arbitrary || 83.5% || [https://github.com/kriswebdev/BaseXML C Python JavaScript] || {{space}}
|-
| {{anchor|Bech32|Bech32m}}Bech32 || Arbitrary || data-sort-value="62.5%"|62.5% + at least 8 chars (label, separator, 6-char [[error correcting code|ECC]]) || C, C++, [[JavaScript]], [[Go (programming language)|Go]], Python, [[Haskell]], [[Ruby (programming language)|Ruby]], [[Rust (programming language)|Rust]]|| Specification.<ref>{{Cite web |date=8 December 2021 |title=bitcoin/bips |url=https://github.com/bitcoin/bips/blob/master/bip-0173.mediawiki#bech32 |website=[[GitHub]]}}</ref> Used in Bitcoin and the [[Lightning Network]].<ref>{{cite web|url=https://github.com/lightningnetwork/lightning-rfc/blob/master/11-payment-encoding.md|title=''Payment encoding'' in the Lightning RFC repo|date=2020-10-15|author=Rusty Russell|website=[[GitHub]]|author-link=Rusty Russell|display-authors=etal}}</ref> The data portion is encoded like Base32 with the possibility to check and correct up to 6 mistyped characters using the 6-character [[BCH code]] at the end, which also checks/corrects the Human Readable Part. The Bech32m variant has a subtle change that makes it more resilient to changes in length.<ref>{{cite web|url=https://github.com/sipa/bips/blob/bip-bech32m/bip-0350.mediawiki|title=Bech32m format for v1+ witness addresses|website=[[GitHub]]|date=5 December 2021}}</ref>
|-
| [[BinHex]] || Arbitrary || 75%|| [http://metacpan.org/module/Convert::BinHex Perl], [http://www.fpx.de/fp/Software/UUDeview/ C], [http://ibiblio.org/pub/linux/utils/compress/macutils.tar.gz C (2)] || MacOS Classic
Line 72 ⟶ 71:
| [[Hexadecimal#Base16 (transfer encoding)|Hexadecimal]] (Base16) || Arbitrary || 50% || Most languages || Exists in [[uppercase]] and [[Letter case#All lowercase|lowercase]] variants
|-
| [[Intel HEX]] || Arbitrary || data-sort-value="50%"|≲50% || [https://github.com/vsergeev/libGIS C library], [httphttps://srecord.sourceforge.net/ C++] || Typically used to program [[EPROM]], [[Flash memory|NOR flash]] memory chips
|-
| [[MIME]] || Arbitrary || See [[Quoted-printable]] and [[Base64]] || See [[Quoted-printable]] and [[Base64]] || Encoding container for e-mail-like formatting
Line 80 ⟶ 79:
| [[Quoted-printable]] || Text || data-sort-value="33%"|~33–100%{{efn|1= One byte stored as =XX. Encoding all but the 94 characters which don't need it (incl. space and tab).}} || Probably many || Preserves line breaks; cuts lines at 76 characters
|-
| [[S-record]] (Motorola hex) || Arbitrary || 49.6% || [https://github.com/vsergeev/libGIS C library], [httphttps://srecord.sourceforge.net/ C++] || Typically used to program [[EPROM]], [[Flash memory|NOR flash]] memory chips. 49.6% assumes 255 binary bytes per record.
|-
| [[Tektronix hex]] || Arbitrary || || || Typically used to program [[EPROM]], [[Flash memory|NOR flash]] memory chips.
|-
|[https://github.com/bchainhub/txms.js#readme TxMS]
| [[Uuencoding]] || Arbitrary || data-sort-value="60%"|~60% ([[Uuencoding#Disadvantages|up to 70%]]) || [[Uuencoding#Perl|Perl]], [http://www.fpx.de/fp/Software/UUDeview/ C], [https://github.com/MHumm/DelphiEncryptionCompendium/blob/master/Source/DECFormat.pas Delphi], [https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/sun/misc/UUEncoder.java Java], [https://docs.python.org/3/library/uu.html Python], probably many others || Largely replaced by MIME and yEnc
|Arbitrary
|
|[https://github.com/bchainhub/txms.js TypeScript, CLI], [https://github.com/bchainhub/flutter_txms Dart]
|TxMS compresses binary data into a readable text format using Binary-to-Text encoding and allows reversible conversion back to hexadecimal.
|-
| [[Uuencoding]] || Arbitrary || data-sort-value="60%"|~60% ([[Uuencoding#Disadvantages|up to 70%]]) || [[Uuencoding#Perl|Perl]], [http://www.fpx.de/fp/Software/UUDeview/ C], [https://github.com/MHumm/DelphiEncryptionCompendium/blob/master/Source/DECFormat.pas Delphi], [https://github.com/frohoff/jdk8u-dev-jdk/blob/master/src/share/classes/sun/misc/UUEncoder.java Java], [https://docs.python.org/3/library/uu.html Python], probably many others || An early encoding developed in 1980 for [[Unix-to-Unix Copy]]. Largely replaced by MIME and [[yEnc]]
|-
| [[Xxencoding]] || Arbitrary || data-sort-value="75%"|~75% (similar to Uuencoding) || [http://www.fpx.de/fp/Software/UUDeview/ C], [https://github.com/MHumm/DelphiEncryptionCompendium/blob/master/Source/DECFormat.pas Delphi] || Proposed (and occasionally used) as replacement for Uuencoding to avoid character set translation problems between ASCII and the EBCDIC systems that could corrupt Uuencoded data
Line 113 ⟶ 118:
Some encodings (the original version of BinHex and the recommended encoding for [[CipherSaber]]) use four bits instead of six, mapping all possible sequences of 4 bits onto the 16 standard [[hexadecimal]] digits. Using 4 bits per encoded character leads to a 50% longer output than base64, but simplifies encoding and decoding—expanding each byte in the source independently to two encoded bytes is simpler than base64's expanding 3 source bytes to 4 encoded bytes.
 
Out of [[PETSCII]]'s first 192 codes, 164 have visible representations when quoted: 5 (white), 17–20 and 28–31 (colors and cursor controls), 32–90 (ascii equivalent), 91–127 (graphics), 129 (orange), 133–140 (function keys), 144–159 (colors and cursor controls), and 160–192 (graphics).<ref>http{{Cite web |title=Commodore 64 PETSCII codes |url=https://sta.c64.org/cbm64pet.html et al|website=sta.c64.org}}</ref> This theoretically permits encodings, such as base128, between PETSCII-speaking machines.
 
== See also ==
Line 121 ⟶ 126:
* [[Computer number format]]
* [[Geocode]]
* [[Numeral system]]s, [[List of numeral systems#By type of notations|listed by notation type]] <!-- This is here to help readers to find encodings that may not belong in this article (e.g. programmers or cryptographers looking for something such as [[Base 26]]), since the topic of this article is _not currently “Data"Data-to-text encodings”encodings", but rather “Binary"Binary-to-text encodings”encodings"
 
Originally was to be a ‘hatnote’'hatnote', viz:
{{see also|List of numeral systems#By type of notation}}