Comparison of data-serialization formats: Difference between revisions

Content deleted Content added
DavidL (talk | contribs)
Overview: update: CBOR RFC 8949 (dec 2020)
m resize table font
 
(84 intermediate revisions by 46 users not shown)
Line 1:
{{Short description|None}}
This is a '''comparison of [[data serialization|data-serialization]] [[file format|format]]s''', various ways to convert complex [[object (computer science)|object]]s to sequences of [[bit]]s. It does not include [[markup language]]s used exclusively as [[document file format]]s.
This is a '''comparison of [[data serialization]] formats''', various ways to convert complex [[object (computer science)|object]]s to sequences of [[bit]]s. It does not include [[markup language]]s used exclusively as [[document file format]]s.
 
==Overview==
 
{| class="wikitable sortable mw-collapsible"
{{sort-under}}
{{sticky table start}}
{| class="wikitable sortable sort-under sticky-table-head" style="font-size:75%"
|-
! Name
Line 11 ⟶ 15:
! [[Binary format|Binary]]?
! [[Human-readable]]?
! Supports [[Referencereference (computer science)|reference]]s?{{ref|stdrefs|e}}
! Schema-[[Interfaceinterface description language|IDL]]?
! Standard [[API]]s
! Supports [[Zerozero-copy]] operations
|-
| [[Apache Arrow]]
| [[Apache Software Foundation]]
| {{n/a}}
| {{partial|''De facto''}}
| [https://arrow.apache.org/docs/format/Columnar.html Arrow Columnar Format]
| {{yes}}
| {{no}}
| {{yes}}
| {{yes|Built-in}}
| C, C++, C#, Go, Java, JavaScript, Julia, Matlab, Python, R, Ruby, Rust, Swift
| {{yes}}
|-
| [[Apache Avro]]
Line 22 ⟶ 38:
| [https://avro.apache.org/docs/current/spec.html Apache Avro™ Specification]
| {{yes}}
| {{partial}}{{ref|avrojson|jg}}
| {{n/a}}
| {{yes}} (built|Built-in)}}
| C, C#, C++, Java, PHP, Python, Ruby
| {{n/a}}
Line 32 ⟶ 48:
| {{n/a}}
| {{no}}
| [[Apache Parquet]][https://parquet.apache.org Apache Parquet]
| {{yes}}
| {{no}}
Line 39 ⟶ 55:
| Java, Python, C++
| {{no}}
|-
| [[Apache Thrift]]
| [[Facebook]] (creator)<br>[[Apache Software Foundation|Apache]] (maintainer)
| {{n/a}}
| {{no}}
| [http://thrift.apache.org/static/files/thrift-20070401.pdf Original whitepaper]
| {{yes}}
| {{partial}}{{ref|thrifttxt|c}}
| {{no}}
| {{yes|Built-in}}
| C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml, Delphi and other languages<ref>[https://thrift.apache.org/ Apache Thrift]</ref>
| {{n/a}}
|-
| [[ASN.1]]
Line 44 ⟶ 72:
| {{n/a}}
| {{yes}}
| ISO/IEC 8824 / ITU-T X.680 (syntax) and ISO/IEC 8825 / ITU-T X.690 (encoding rules) series. X.680, X.681, and X.683 define syntax and semantics.
| ISO/IEC 8824; X.680 series of ITU-T Recommendations
| {{yes}}<br>(|[[Basic Encoding Rules|BER]], [[Distinguished Encoding Rules|DER]], [[Packed Encoding Rules|PER]], [[Octet encoding rules|OER]], or custom via [[Encoding Control Notation|ECN]])}}
| {{yes}}<br>(|[[XML Encoding Rules|XER]], [[JSON encoding rules|JER]], [[Generic String Encoding Rules|GSER]], or custom via [[Encoding Control Notation|ECN]])}}
| {{partialyes}}{{ref|asn1refs|f}}
| {{yes}} (built|Built-in)}}
| {{n/a}}
| {{yes}} (|[[Octet encoding rules|OER]])}}
|-
| [[Bencode]]
| [[Bram Cohen]] (creator)<br>[[BitTorrent, Inc.]] (maintainer)
| {{n/a}}
| {{yes| ''De facto'' standard}} viaas {{abbr|BEP|BitTorrent Enhancement Proposal (BEP)}}}}
| Part of [http://bittorrent.org/beps/bep_0003.html BitTorrent protocol specification]
| {{partial|Partially}}<br>(Except numbers and delimiters, arebeing ASCII)}}
| {{no}}
| {{no}}
Line 64 ⟶ 92:
| {{no}}
|-
| [[BSON]]
| [[Binn (serialization format)|Binn]]
| [[MongoDB]]
| Bernardo Ramos
| [[JSON]]
| {{n/a}}
| {{no}}
| [httpshttp://githubbsonspec.com/liteserver/binn/blob/master/spec.mdorg BinnBSON Specification]
| {{yes}}
| {{no}}
Line 74 ⟶ 102:
| {{no}}
| {{no}}
| {{yesno}}
|-
| [[Cap%27n Proto]]
| [[Blink (serialization format)|Blink]]
| Kenton Varda
| Pantor Engineering
| {{n/a}}
| {{no}}
| [https://rustyxcapnproto.org/blink/specsencoding.html BlinkCap'n Proto Encoding SpecificationsSpec]
| {{yes}}
| {{partial}}{{ref|capnptextformat|h}}
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{no}}
|-
| BJSON
| Roman Pietrzak, Sylwester Wysocki
| [[JSON]]
| {{no}}
| [http://bjson.org BJSON Specification]
| {{yes}}
| {{no}}
| {{no}}
| {{no}}
| {{yes}}<br>C, C++, node.js
| {{yes}}
|-
| [[BSON]]
| [[MongoDB]]
| [[JSON]]
| {{no}}
| [http://bsonspec.org BSON Specification]
| {{yes}}
| {{no}}
| {{no}}
| {{no}}
| {{no}}
| {{no}}
|-
| [[CBOR]]
| Carsten Bormann, [[Paul Hoffman (engineer)|P. Hoffman]]
| [[MessagePack]]<ref>{{cite web|url=https://github.com/msgpack/msgpack/issues/258#issuecomment-449978394|title=CBOR relationship with msgpack|first1=Carsten|last1=Bormann|website=[[GitHub]] |date=2018-12-26|access-date=2023-08-14}}</ref>
| [[JSON]] (loosely)
| {{yes}}
| RFC 8949
| {{yes}}
| {{no}}
| {{yes}}, <br/>through tagging
| {{yes}}<br/>(|[https://tools.ietf.org/html/rfc8610 CDDL])}}
| {{yes|[[FIDO_Alliance|FIDO2]]}}
| {{no}}
| {{no}}
|-
Line 127 ⟶ 131:
| RFC author:<br>Yakov Shafranovich
| {{n/a}}
| {{partial}}<br>(myriad|Myriad informal variants used)}}
| RFC 4180<br>(among others)
| {{no}}
Line 145 ⟶ 149:
| {{yes}}
| {{yes}}
| ADAAda, C, C++, Java, Cobol, Lisp, Python, Ruby, Smalltalk
| {{n/a}}
|-
Line 157 ⟶ 161:
| {{no}}
| {{partial}}<br>(Signature strings)
| {{yes}}<br>(see |[[D-Bus|Yes]])}}
| {{n/a}}
|-
| [[ExtensibleEfficient DataXML NotationInterchange]] (EDNEXI)
| [[World Wide Web Consortium|W3C]]
| Cognitect
| [[JSONXML]], (loosely)Efficient XML
| {{Yes}}
| [https://www.w3.org/TR/exi/ Efficient XML Interchange (EXI) Format 1.0]
| {{Yes}}
| {{yes|[[XML]]}}
| {{Yes|[[XPointer]], [[XPath]]}}
| {{Yes|[[XML Schema (W3C)|XML Schema]]}}
| {{Yes|[[Document Object Model|DOM]], [[Simple API for XML|SAX]], [[StAX]], [[XQuery]], [[XPath]]}}
| {{n/a}}
|-
| [[Extensible Data Notation]] (edn)
| [[Rich Hickey]] / Clojure community
| [[Clojure]]
| {{yes}}
| [https://github.com/edn-format/edn EDNOfficial Specificationedn spec]
| {{no}}
| {{yes}}
| {{no}}
| {{no}}
| Clojure, Ruby, Go, C++, Javascript, Java, CLR, ObjC, Python<ref>{{cite web|url=https://github.com/edn-format/edn/wiki/Implementations|title=Implementations|website=[[GitHub]] }}</ref>
| {{no}}
| {{n/a}}
|-
| [[Efficient XML Interchange]] (EXI)
| [[World Wide Web Consortium|W3C]]
| [[XML]], [https://www.agiledelta.com/product_efx.html Efficient XML]
| {{Yes}}
| [https://www.w3.org/TR/exi/ Efficient XML Interchange (EXI) Format 1.0]
| {{Yes}}
| {{yes}}<br>([[XML]])
| {{Yes}}<br>([[XPointer]], [[XPath]])
| {{Yes}}<br>([[XML Schema (W3C)|XML Schema]])
| {{Yes}}<br>([[Document Object Model|DOM]], [[Simple API for XML|SAX]], [[StAX]], [[XQuery]], [[XPath]])
| {{n/a}}
|-
| [[FlatBuffers]]
Line 188 ⟶ 192:
| {{n/a}}
| {{no}}
| [https://google.github.io/flatbuffers/ flatbuffersFlatbuffers github page] [[Specification]GitHub]
| {{yes}}
| {{yes}}<br>(|[[Apache Arrow]])}}
| {{partial}}<br>(internal to the buffer)
| {{yes}} |[https://google.github.io/flatbuffers/flatbuffers_guide_writing_schema.html Yes]}}
| C++, Java, C#, Go, Python, Rust, JavaScript, PHP, C, Dart, Lua, TypeScript
| {{yes}}
Line 203 ⟶ 207:
| {{yes}}
| {{no}}
| {{yes}}<br>(|[[XPointer]], [[XPath]])}}
| {{yes}}<br>(|[[XML schema]])}}
| {{yes}}<br>(|[[Document Object Model|DOM]], [[Simple API for XML|SAX]], [[XQuery]], [[XPath]])}}
| {{n/a}}
|-
Line 228 ⟶ 232:
| {{yes}}
| {{no}}
| {{Yes}} <br> (|[https://amzn.github.io/ion-schema/ Ion Schemaschema])}}
| C, C#, Go, Java, JavaScript, Python, Rust
| {{no}}
| {{n/a}}
|-
Line 248 ⟶ 252:
| [[JavaScript syntax]]
| {{yes}}
| [https://tools.ietf.org/html/std90 STD 90]/RFC 8259<br>(ancillary:<br>RFC 6901,<br>RFC 6902), [http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-404.pdf ECMA-404], [https://www.iso.org/standard/71616.html ISO/IEC 21778:2017]
| {{no}}, but see [[BSON]], [[Smile (data interchange format)|Smile]], [[UBJSON]]
| {{yes}}
| {{yes}}<br>(|[https://tools.ietf.org/html/rfc6901 JSON Pointer (RFC {{nbsp}}6901)];<br>, or alternately:<br>, [http://goessner.net/articles/JsonPath/ JSONPath], [https://web.archive.org/web/20120922110739/http://bluelinecity.com/software/jpath/ JPath], [https://web.archive.org/web/20121203081945/http://www.jspon.org/ JSPON], [https://github.com/lloyd/JSONSelect json:select()]),; and [[JSON-LD]]}}
| {{partial}}<br>([http://json-schema.org/ JSON Schema Proposal], [[ASN.1]] with [[JSON encoding rules|JER]], [http://www.kuwata-lab.com/kwalify/ Kwalify], [http{{Webarchive|url=https://rjbsweb.manxomearchive.org/rxweb/20210812231831/http://www.kuwata-lab.com/kwalify/ Rx]|date=2021-08-12 }}, [http://itemscriptrjbs.manxome.org/ItemscriptSchema.htmlrx/ Itemscript SchemaRx]), [[JSON-LD]]
| {{partial}}<br>([https://github.com/dscape/clarinet Clarinet], [https://www.sitepen.com/blog/jsonquery-data-querying-beyond-jsonpath JSONQuery] / [https://www.sitepen.com/blog/resource-query-language-a-query-language-for-the-web-nosql RQL], [http://goessner.net/articles/JsonPath/ JSONPath]), [[JSON-LD]]
| {{no}}
Line 273 ⟶ 277:
| {{no}}
| [http://cr.yp.to/proto/netstrings.txt netstrings.txt]
| {{partial|Partially}}<br>(delimiters areExcept ASCII) delimiters}}
| {{yes}}
| {{no}}
Line 285 ⟶ 289:
| {{no}}
| [http://ogdl.org/spec/ Specification]
| {{yes}}<br>(|[http://ogdl.org/spec/binary.html Binary Specificationspecification])}}
| {{yes}}
| {{yes}}<br>(|[http://ogdl.org/spec/path.html Path Specificationspecification])}}
| {{yes}}<br>(|[http://ogdl.org/spec/schema.html Schema WD])}}
|
| {{n/a}}
|-
| [[OPC_Unified_ArchitectureOPC Unified Architecture|OPC-UA Binary]]
| [[OPC Foundation]]
| {{n/a}}
Line 313 ⟶ 317:
| {{yes}}
| {{no}}
| {{yes}}<br>(|[http://openddl.org/ OpenDDL Librarylibrary])}}
| {{n/a}}
|-
Line 331 ⟶ 335:
| [[Guido van Rossum]]
| [[Python (programming language)|Python]]
| {{yes| ''De facto'' standard}} viaas [[Python Enhancement Proposals (PEPs)Proposal|PEP]]s}}
| [https://www.python.org/dev/peps/pep-3154/] PEP 3154 -- Pickle protocol version 4]
| {{yes}}
| {{no}}
| {{yes}}<ref>[https://github.com/python/cpython/blob/v3.9.0/Lib/pickle.py#L137-L144 cpython/Lib/pickle.py]</ref>
| {{no}}
| {{yes}}
| {{yes}}<br>([https://www.python.org/dev/peps/pep-3154/])
| {{no}}
|-
Line 356 ⟶ 360:
| {{n/a}}
| {{no}}
| [https://developers.google.com/protocol-buffers/docs/encoding Developer Guide: Encoding], [https://developers.google.com/protocol-buffers/docs/reference/proto2-spec proto2 specification], and [https://developers.google.com/protocol-buffers/docs/reference/proto3-spec proto3 specification]
| {{yes}}
| {{partialyes}}{{ref|pbtextformat|d}}
| {{no}}
| {{yes}} (built|Built-in)}}
| C++, Java, C#, Python, Go, Ruby, Objective-C, C, Dart, Perl, PHP, R, Rust, Scala, Swift, Julia, Erlang, D, Haskell, Action ScriptActionScript, Delphi, Elixir, Elm, Erlang, GopherJS, Haskell, Haxe, JavaScript, Kotlin, Lua, Matlab, Mercurt, OCaml, Prolog, Solidity, Typescript, Vala, Visual Basic
| {{no}}
|-
| [[Ethereum]] Recursive Length Prefix (RLP)
| [[Ethereum]]
| {{n/a}}
| {{no}}
| [https://github.com/ethereum/wiki/wiki/RLP Specification]
| {{yes}}
| {{no}}
| {{no}}
| {{no}}
| Erlang, Go, Java, Javascript, Kotlin, Objective-C, Python, Swift, PHP
| {{yes}}
|-
| {{nobr|[[S-expression]]s}}
| [[John McCarthy (computer scientist)|John McCarthy]] (original)<br>[[Ron Rivest]] (internet draft)
| [[Lisp (programming language)|Lisp]], [[Netstring]]s
| {{partial}}<br>(largely|Largely ''de facto'')}}
| [http://people.csail.mit.edu/rivest/Sexp.txt "S-Expressions"] {{Webarchive|url=https://web.archive.org/web/20131007024815/http://people.csail.mit.edu/rivest/Sexp.txt |date=2013-10-07 }} [[Internet Draft]]
| {{yes}}<br>("Canonical, ''canonical representation")''
| {{yes}}<br>("Advanced, ''advanced transport representation")''
| {{no}}
| {{no}}
Line 395 ⟶ 387:
| {{yes}}
| {{no}}
| {{noyes}}
| {{partial}}<br>([http://json-schema.org/ JSON Schema Proposal], other JSON schemas/IDLs)
| {{partial}}<br>(via JSON APIs implemented with Smile backend, on Jackson, Python)
Line 407 ⟶ 399:
| {{partial}}<br>({{nobr|[[Efficient XML Interchange]]}}, {{nobr|[[Binary XML]]}}, {{nobr|[[Fast Infoset]]}}, [[Message Transmission Optimization Mechanism|MTOM]], {{nobr|[[XSD]] base64 data}})
| {{yes}}
| {{yes}}<br>(built|Built-in id/ref, [[XPointer]], [[XPath]])}}
| {{yes}}<br>(|[[WSDL]], [[XML schema]])}}
| {{yes}}<br>(|[[Document Object Model|DOM]], [[Simple API for XML|SAX]], [[XQuery]], [[XPath]])}}
| {{n/a}}
|-
Line 422 ⟶ 414:
| {{no}}
|
| {{n/a}}
|-
| [[Apache Thrift]]
| [[Facebook]] (creator)<br>[[Apache Software Foundation|Apache]] (maintainer)
| {{n/a}}
| {{no}}
| [http://thrift.apache.org/static/files/thrift-20070401.pdf Original whitepaper]
| {{yes}}
| {{partial}}{{ref|thrifttxt|c}}
| {{no}}
| {{yes}} (built-in)
| C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml, Delphi and other languages<ref>[https://thrift.apache.org/ Apache Thrift]</ref>
| {{n/a}}
|-
Line 440 ⟶ 420:
| [[JSON]], [[BSON]]
| {{no}}
| [http://ubjson.org/ ubjson.org]
| {{yes}}
| {{no}}
Line 452 ⟶ 432:
| {{n/a}}
| {{yes}}
| [https://tools.ietf.org/html/std67 STD 67]/RFC 4506
| {{yes}}
| {{no}}
Line 467 ⟶ 447:
| {{partial}}<br>({{nobr|[[Efficient XML Interchange]]}}, {{nobr|[[Binary XML]]}}, {{nobr|[[Fast Infoset]]}}, {{nobr|[[XSD]] base64 data}})
| {{yes}}
| {{yes}}<br>(|[[XPointer]], [[XPath]])}}
| {{yes}}<br>(|[[XML schema]], [[RELAX NG]])}}
| {{yes}}<br>(|[[Document Object Model|DOM]], [[Simple API for XML|SAX]], [[XQuery]], [[XPath]])}}
| {{n/a}}
|-
Line 476 ⟶ 456:
| [[XML]]
| {{no}}
| [http://xmlrpc.scripting.com/spec.md XML-RPC Specification]
| {{no}}
| {{yes}}
Line 486 ⟶ 466:
| [[YAML]]
| Clark Evans,<br>Ingy döt Net,<br>and Oren Ben-Kiki
| [[C (programming language)|C]], [[Java (programming language)|Java]], [[Perl]], [[Python (programming language)|Python]], [[Ruby (programming language)|Ruby]], [[Email]], [[HTML]], [[MIME]], [[URI]], [[XML]], [[Simple API for XML|SAX]], [[SOAP]], [[JSON]]<ref>{{cite web|url=http://yaml.org/spec/1.2/spec.html#id2708710|title=YAML Ain’tAin't Markup Language (YAML) Version 1.2|first1=Oren |last1=Ben-Kiki |first2=Clark |last2=Evans |first3=Ingy döt |last3=Net|date=2009-10-01|work=The Official YAML Web Site|access-date=2012-02-10}}</ref>
| {{no}}
| [http://www.yaml.org/spec/1.2/spec.html Version 1.2]
Line 492 ⟶ 472:
| {{yes}}
| {{yes}}
| {{partial}}<br>([http://www.kuwata-lab.com/kwalify/ Kwalify] {{Webarchive|url=https://web.archive.org/web/20210812231831/http://www.kuwata-lab.com/kwalify/ |date=2021-08-12 }}, [http://rjbs.manxome.org/rx/ Rx], built-in language type-defs)
| {{no}}
| {{no}}
Line 503 ⟶ 483:
! [[Binary format|Binary]]?
! [[Human-readable]]?
! Supports [[Referencereference (computer science)|reference]]s?{{ref|stdrefs|e}}
! Schema-[[Interfaceinterface description language|IDL]]?
! Standard [[API]]s
! Supports [[Zerozero-copy]] operations
|}
{{sticky table end}}
*a. {{note|plbin}}The current default format is binary.
 
*b. {{note|pltxt}}The "classic" format is plain text, and an XML format is also supported.
{{ordered list
*c. {{note|thrifttxt}}Theoretically possible due to abstraction, but no implementation is included.
| list-style-type=lower-alpha
*d. {{note|pbtextformat}}The primary format is binary, but a text format is available.<ref>{{cite web|url=https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.text_format|title=text_format.h - Protocol Buffers|website=Google Developers}}</ref>
| {{note|plbin}}The current default format is binary.
*e. {{note|stdrefs}}Means that generic tools/libraries know how to encode, decode, and dereference a reference to another piece of data in the same document. A tool may require the [[Interface description language|IDL]] file, but no more. Excludes custom, non-standardized referencing techniques.
| {{note|pltxt}}The "classic" format is plain text, and an XML format is also supported.
*f. {{note|asn1refs}}ASN.1 does offer [[Object identifier|OIDs]], a standard format for globally unique identifiers, as well as a standard notation ("absolute reference") for referencing a component of a value. Thus it would be possible to reference a component of an encoded value present in a document by combining an OID (assigned to the document) and an "absolute reference" to the component of the value. However, there is no standard way to indicate that a field contains such an absolute reference. Therefore, a generic ASN.1 tool/library cannot automatically encode/decode/resolve references within a document without help from custom-written program code.
| {{note|thrifttxt}}Theoretically possible due to abstraction, but no implementation is included.
*g. {{note|vpack1refs}}VelocyPack offers a value type to store pointers to other VPack items. It is allowed if the VPack data resides in memory, but not if stored on disk or sent over a network.
*h.| {{note|capnptextformatpbtextformat}}The primary format is binary, but a text formatand isJSON formats are available.<ref>{{cite web|url=https://githubdevelopers.google.com/capnprotoprotocol-buffers/capnprotodocs/reference/cpp/google.protobuf.text_format|title=Cap'ntext_format.h Proto- serialization/RPCProtocol system: core tools and C++ library - capnproto/capnprotoBuffers|datewebsite=2Google April 2019|via=GitHubDevelopers}}</ref><ref>{{cite web|url=https://capnprotodevelopers.orggoogle.com/capnpprotocol-tool.htmlbuffers/docs/proto3#decoding-messagesjson|title=Cap'nJSON Proto:Mapping The- capnpProtocol ToolBuffers|website=capnproto.orgGoogle Developers}}</ref>
| {{note|stdrefs}}Means that generic tools/libraries know how to encode, decode, and dereference a reference to another piece of data in the same document. A tool may require the [[Interface description language|IDL]] file, but no more. Excludes custom, non-standardized referencing techniques.
*i. {{note|fbetextformat}}The primary format is binary, but text and json formats are available.<ref>{{cite web|url=https://github.com/chronoxor/FastBinaryEncoding|title=Fast Binary Encoding is ultra fast and universal serialization solution for C++, C#, Go, Java, JavaScript, Kotlin, Python, Ruby: chronoxor/FastBinaryEncoding|date=2 April 2019|via=GitHub}}</ref>
| {{note|asn1refs}}ASN.1 has X.681 (Information Object System), X.682 (Constraints), and X.683 (Parameterization) that allow for the precise specification of open types where the types of values can be identified by integers, by [[Object identifier|OIDs]], etc. OIDs are a standard format for globally unique identifiers, as well as a standard notation ("absolute reference") for referencing a component of a value. For example, PKIX uses such notation in RFC 5912. With such notation (constraints on parameterized types using information object sets), generic ASN.1 tools/libraries can automatically encode/decode/resolve references within a document.
*j. {{note|avrojson}}The primary format is binary, a json encoder is available.<ref>{{cite web|url=https://avro.apache.org/docs/1.9.2/spec.html#json_encoding|title=Avro Json Format}}</ref>
| {{note|avrojson}}The primary format is binary, a json encoder is available.<ref>{{cite web|url=https://avro.apache.org/docs/1.9.2/spec.html#json_encoding|title=Avro Json Format}}</ref>
| {{note|capnptextformat}}The primary format is binary, but a text format is available.
}}
 
==Syntax comparison of human-readable formats==
 
{| class="wikitable"
{{sticky table start}}
{| class="wikitable sortable sort-under sticky-table-head" style="font-size:75%"
|-
! Format
Line 578 ⟶ 563:
A to Z,1,2,3</pre>
|-
| [[Extensible Data Notation|edn]]
! Format
| <code>nil</code>
! [[Nullable type|Null]]
| <code>true</code>
! [[Boolean data type|Boolean]] true
| <code>false</code>
! [[Boolean data type|Boolean]] false
| <code>685230</code><br><code>-685230</code>
! [[Integer (computer science)|Integer]]
| <code>6.8523015e+5</code>
! [[Floating-point]]
| <code>"A to Z"</code>, <code>"A \"up to\" Z"</code>
! [[String (computer science)|String]]
| <code>[true nil -42.1e7 "A to Z"]</code>
! [[Array data type|Array]]
| <code>{:kw 1, "42" true, "A to Z" [1 2 3]}</code>
! [[Associative array]]/[[Object (computer science)|Object]]
|-
| [[Ion (Serialization format)|Ion]]
Line 661 ⟶ 646:
true
"A to Z", (1, 2, 3)</pre>
|-
! Format
! [[Nullable type|Null]]
! [[Boolean data type|Boolean]] true
! [[Boolean data type|Boolean]] false
! [[Integer (computer science)|Integer]]
! [[Floating-point]]
! [[String (computer science)|String]]
! [[Array data type|Array]]
! [[Associative array]]/[[Object (computer science)|Object]]
|-
| [[OpenDDL]]
Line 787 ⟶ 762:
[extensionFieldThatIsAnEnum]: EnumValue
</syntaxhighlight>
|-
! Format
! [[Nullable type|Null]]
! [[Boolean data type|Boolean]] true
! [[Boolean data type|Boolean]] false
! [[Integer (computer science)|Integer]]
! [[Floating-point]]
! [[String (computer science)|String]]
! [[Array data type|Array]]
! [[Associative array]]/[[Object (computer science)|Object]]
|-
| [[S-expression]]s
Line 880 ⟶ 845:
</struct></syntaxhighlight>
|}
{{sticky table end}}
{{ordered list
| list-style-type=lower-alpha
| {{note|guess}}Omitted XML elements are commonly decoded by [[XML data binding]] tools as NULLs. Shown here is another possible encoding; [[XML schema]] does not define an encoding for this datatype.
| {{note|csvguess}}The RFC CSV specification only deals with delimiters, newlines, and quote characters; it does not directly deal with serializing programming [[data structure]]s.
| {{note|netguess}}The [[netstring]]s specification only deals with nested [[byte string]]s; anything else is outside the scope of the specification.
| {{note|phpfloat}}PHP will unserialize any floating-point number correctly, but will serialize them to their full decimal expansion. For example, 3.14 will be serialized to {{val|3.140000000000000124344978758017532527446746826171875}}.
| {{note|xmlguess}}[[XML data binding]]s and [[SOAP]] serialization tools provide type-safe XML serialization of programming [[data structure]]s into XML. Shown are XML values that can be placed in XML elements and attributes.
| {{note|lispstd}}This syntax is not compatible with the Internet-Draft, but is used by some dialects of [[Lisp (programming language)|Lisp]].
}}
 
*a. {{note|guess}}Omitted XML elements are commonly decoded by [[XML data binding]] tools as NULLs. Shown here is another possible encoding; [[XML schema]] does not define an encoding for this datatype.
*b. {{note|csvguess}}The RFC CSV specification only deals with delimiters, newlines, and quote characters; it does not directly deal with serializing programming [[data structure]]s.
*c. {{note|netguess}}The [[netstring]]s specification only deals with nested [[byte string]]s; anything else is outside the scope of the specification.
*d. {{note|phpfloat}}PHP will unserialize any floating-point number correctly, but will serialize them to their full decimal expansion. For example, 3.14 will be serialized to 3.140000000000000124344978758017532527446746826171875.
*e. {{note|xmlguess}}[[XML data binding]]s and [[SOAP]] serialization tools provide type-safe XML serialization of programming [[data structure]]s into XML. Shown are XML values that can be placed in XML elements and attributes.
*f. {{note|lispstd}}This syntax is not compatible with the Internet-Draft, but is used by some dialects of [[Lisp (programming language)|Lisp]].
 
==Comparison of binary formats==
<!--This table is meant to describe how the various datatypes are encoded in binary in the various formats.-->
 
{| class="wikitable"
{{sticky table start}}
|-
{| class="wikitable sortable sort-under sticky-table-head sticky-table-col1" style="font-size:75%"
|- style="vertical-align:bottom;"
! Format
! [[Nullable type|Null]]
Line 898 ⟶ 869:
! [[Floating-point]]
! [[String (computer science)|String]]
! [[Array (data type)|Array]]
! [[Associative array]]/[[Objectobject (computer science)|Objectobject]]
|- style="vertical-align:top;"
|-
| [[ASN.1]]<br>([[Basic Encoding Rules|BER]], [[Packed Encoding Rules|PER]] or [[Octet encoding rules|OER]] encoding)
| {{mono|NULL}} type
| {{mono|BOOLEAN}}: {{ubli
* | BER: as 1 byte in binary form;
* | PER: as 1 bit;
* | OER: as 1 byte
}}
| INTEGER:
| {{mono|INTEGER}}: {{ubli
* BER: variable-length big-endian binary representation (up to 2^(2^1024) bits);
| BER: variable-length big-endian binary representation (up to 2{{sup|2{{sup|1024}}}} bits);
* PER Unaligned: a fixed number of bits if the integer type has a finite range; a variable number of bits otherwise;
* | PER AlignedUnaligned: a fixed number of bits if the integer type has a finite range and the size of the range is less than 65536; a variable number of octetsbits otherwise;
* | OERPER Aligned: one,a two,fixed ornumber fourof octets (either signed or unsigned)bits if the integer type has a finite range thatand fitsthe insize thatof numberthe ofrange octetsis less than 65536; a variable number of octets otherwise;
| OER: 1, 2, or 4 octets (either signed or unsigned) if the integer type has a finite range that fits in that number of octets; a variable number of octets otherwise
| REAL:
}}
base-10 real values are represented as character strings in ISO 6093 format;
| {{mono|REAL}}:{{ubli
 
binary | base-10 real values are represented inas acharacter binarystrings formatin thatISO includes6093 the mantissa, the base (2, 8, or 16), and the exponentformat;
| binary real values are represented in a binary format that includes the mantissa, the base (2, 8, or 16), and the exponent;
 
| the special values {{mono|NaN, -INF, +INF}}, and negative zero are also supported
}}
| Multiple valid types (VisibleString, PrintableString, GeneralString, UniversalString, UTF8String)
| Multiple valid types ({{mono|VisibleString, PrintableString, GeneralString, UniversalString, UTF8String}})
| data specifications SET OF (unordered) and SEQUENCE OF (guaranteed order)
| Data specifications {{mono|SET OF}} (unordered) and {{mono|SEQUENCE OF}} (guaranteed order)
| user definable type
| User definable type
|-
|- style="vertical-align:top;"
|[[Binn (serialization format)|Binn]]
| [[BSON]]
| <code>\x00</code>
| True: <code>\x01</code><br />False: <code>\x02</code>
|[[big-endian]] [[2's complement]] signed and unsigned 8/16/32/64 bits
|[[Single precision floating-point format|single]]: [[big-endian]] [[binary32]]<br />[[Double precision floating-point format|double]]: [[big-endian]] [[binary64]]
|[[UTF-8]] encoded, null terminated, preceded by int8 or int32 string length in bytes
| Typecode (one byte) + 1-4 bytes size + 1-4 bytes items count + list items
| Typecode (one byte) + 1-4 bytes size + 1-4 bytes items count + key/value pairs
|-
|[[BSON]]
| <code>\x0A</code><br>(1 byte)
| True: <code>\x08\x01</code><br>False: <code>\x08\x00</code><br>(2 bytes)
| int32: 32-bit [[little-endian]] [[2's complement]] or int64: 64-bit [[little-endian]] [[2's complement]]
| [[Double -precision floating-point format|doubleDouble]]: [[little-endian]] [[binary64]]
| [[UTF-8]] -encoded, preceded by int32 -encoded string length in bytes
| [[BSON]] embedded document with numeric keys
| [[BSON]] embedded document
|- style="vertical-align:top;"
|-
| [[CBOR|Concise Binary Object Representation]] (CBOR)
| <code>\xf6</code><br>(1 byte)
| {{ubli
| True: <code>\xf5</code><br>False: <code>\xf4</code><br>(1 byte)
| True: <code>\xf5</code>
| Small positive/negative <code>\x00-\x17</code> & <code>\x20-\x37</code> (1 byte)<br>
8-bit | False: positive <code>\x18xf4</code>, negative <code>\x38</code> (+1 byte)<br>
}}
16-bit: positive <code>\x19</code>, negative <code>\x39</code> (+2 bytes)<br>
(1 byte)
32-bit: positive <code>\x1A</code>, negative <code>\x3A</code> (+4 bytes)<br>
| {{ubli
64-bit: positive <code>\x1B</code>, negative <code>\x3B</code> (+8 bytes)<br>
| Small positive/negative <code>\x00</code>–<code>\x17</code> & <code>\x20</code>–<code>\x37</code> (1 byte)
Negative x encoded as (-x-1)<br>
| IEEE8-bit: half/single/doublepositive <code>\xf9x18</code>, -negative <code>\xfbx38</code> (+2-8 bytes1 byte)<br>
| 16-bit: positive <code>\x19</code>, negative <code>\x39</code> (+ 2 bytes)
Decimals and bigfloats (4+ bytes) encoded as <code>\xc4</code> tag + 2-item array of integer mantissa & exponent
| 32-bit: positive <code>\x1A</code>, negative <code>\x3A</code> (+ 4 bytes)
| Length and content (1-9 bytes overhead)<br>
Bytestring | <code>\x40</code>64-bit: -positive <code>\x5fx1B</code><br>[[UTF-8]], negative <code>\x60x3B</code> -(+ <code>\x7f</code><br>8 bytes)
| Negative x encoded as (−x − 1)
Indefinite partial strings <code>\x5f</code> and <code>\x7f</code> stitched together until <code>\xff</code>.
}}
| Length and items <code>\x80</code> - <code>\x9e</code><br>
| {{ubli
Indefinite list <code>\x9f</code> terminated by <code>\xff</code> entry.
| LengthIEEE (in pairs) and itemshalf/single/double <code>\xa0xf9</code> - <code>\xbexfb</code><br> (+ 2–8 bytes)
| Decimals and bigfloats (4+ bytes) encoded as <code>\xc4</code> tag + 2-item array of integer mantissa & exponent
Indefinite map <code>\xbf</code> terminated by <code>\xff</code> key.
}}
|-
| {{ubli
|[[Efficient XML Interchange|Efficient XML Interchange (EXI)]]{{efn |group=binary |Any XML based representation can be compressed, or generated as, using [https://www.w3.org/XML/EXI/ EXI - Efficient XML Interchange], which is a "Schema Informed" (as opposed to schema-required, or schema-less) binary compression standard for XML.}}<br>
| Length and content (1–9 bytes overhead)
| Bytestring <code>\x40</code>–<code>\x5f</code>
| [[UTF-8]] <code>\x60</code>–<code>\x7f</code>
| Indefinite partial strings <code>\x5f</code> and <code>\x7f</code> stitched together until <code>\xff</code>.
}}
| {{ubli
| Length and items <code>\x80</code>–<code>\x9e</code>
| Indefinite list <code>\x9f</code> terminated by <code>\xff</code> entry.
}}
| {{ubli
| Length (in pairs) and items <code>\xa0</code>–<code>\xbe</code>
| Indefinite map <code>\xbf</code> terminated by <code>\xff</code> key.
}}
|- style="vertical-align:top;"
| [[Efficient XML Interchange|Efficient XML Interchange (EXI)]]{{efn |group=binary |Any XML based representation can be compressed, or generated as, using EXI {{ndash}} {{Cite web |title=Efficient XML Interchange (EXI) Format 1.0 (Second Edition) |url=https://www.w3.org/TR/2014/REC-exi-20140211/Overview.html}}<ref>{{Cite web |title=Efficient Extensible Interchange |url=https://www.w3.org/XML/EXI/index.html}}</ref> {{ndash}} which is a "Schema Informed" (as opposed to schema-required, or schema-less) binary compression standard for XML.}}<br>
(Unpreserved lexical values format)
| xsi:nil is not allowed in binary context.
| 1-21–2 bit integer interpreted as boolean.
| Boolean sign, plus arbitrary length 7-bit octets, parsed until most-significant bit is 0, in little-endian. The schema can set the zero-point to any arbitrary number.<br>
Unsigned skips the boolean flag.
| {{ubli
| Float: integer mantissa and integer exponent.<br>
Decimal | Float: boolean sign, integer wholemantissa value,and integer fractionalexponent.
| Decimal: boolean sign, integer whole value, integer fractional.
| Length prefixed Integer-encoded Unicode. Integers may represent enumerations or string table entries instead.
}}
| Length prefixed integer-encoded Unicode. Integers may represent enumerations or string table entries instead.
| Length prefixed set of items.
| {{No|Not in protocol.}}
|- style="vertical-align:top;"
|-
| [[FlatBuffers]]
| Encoded as absence of field in parent object
| {{ubli
| True: one byte <code>\x01</code><br>False: <code>\x00</code>
| True: <code>\x01</code>
| [[little-endian]] [[2's complement]] signed and unsigned 8/16/32/64 bits
| False: <code>\x00</code>
|[[Single precision floating-point format|floats]]: [[little-endian]] [[binary32]]
}}
[[Double precision floating-point format|doubles]]: [[little-endian]] [[binary64]]
(1 byte)
|[[UTF-8]] encoded, preceded by 32 bit integer length of string in bytes
| [[Little-endian]] [[2's complement]] signed and unsigned 8/16/32/64 bits
| Vectors of any other type, preceded by 32 bit integer length of number of elements
| {{ubli
| [[Single-precision floating-point format|Floats]]: [[little-endian]] [[binary32]]
| [[Double-precision floating-point format|Doubles]]: [[little-endian]] [[binary64]]
}}
| [[UTF-8]]-encoded, preceded by 32-bit integer length of string in bytes
| Vectors of any other type, preceded by 32-bit integer length of number of elements
| Tables (schema defined types) or Vectors sorted by key (maps / dictionaries)
|- style="vertical-align:top;"
|-
| [[Ion_Ion (serialization_format)serialization format)| Ion]]<ref>[http://amzn.github.io/ion-docs/docs/binary.html Ion Binary Encoding]</ref>
| <code>\x0f</code>{{efn |group=binary |All basic Ion types have a null variant, as its 0xXf tag. Any tag beginning with 0x0X other than 0x0f defines ignored padding.}}
| {{ubli
|True: <code>\x11</code><br>
False | True: <code>\x10x11</code>
|positive <code>\x2x</code>, negativeFalse: <code>\x3xx10</code><br>
}}
Zero is always encoded in tag byte<br>
| {{ubli
BigInts over 13 bytes (104 bits) have 1+ byte overhead for length
| Positive <code>\x44x2x</code>, (32-bitnegative float)<br><code>\x48x3x</code> (64-bit float)<br>
| Zero is always encoded in tag byte.
| BigInts over 13 bytes (104 bits) have 1+ byte overhead for length
|[[UTF-8]]: <code>\x8x</code><br>
}}
Other strings: <code>\x9x</code><br>
| {{ubli
Arbitrary length and overhead
| <code>\xbxx44</code><br> (32-bit float)
| <code>\x48</code> (64-bit float)
Arbitrary length and overhead. Length in octets.
| Zero is always encoded in tag byte.
|Structs (numbered fields): <code>\xdx</code><br>
}}
Annotations (named fields): <code>\xex</code>
| {{ubli
|-
| [[UTF-8]]: <code>\x8x</code>
| Other strings: <code>\x9x</code>
| Arbitrary length and overhead
}}
| <code>\xbx</code> Arbitrary length and overhead. Length in octets.
| {{ubli
| Structs (numbered fields): <code>\xdx</code>
| Annotations (named fields): <code>\xex</code>
}}
|- style="vertical-align:top;"
| [[MessagePack]]
| <code>\xc0</code>
| {{ubli
| True: <code>\xc3</code><br>False: <code>\xc2</code>
| True: <code>\xc3</code>
| Single byte "fixnum" (values -32..127)
| False: <code>\xc2</code>
or
}}
typecode (one byte) + big-endian (u)int8/16/32/64
| {{ubli
| Typecode (one byte) + IEEE single/double
| Single byte "fixnum" (values {{nowrap|−32 – 127}})
| Typecode + up to 15 bytes<br />or<br />typecode + length as uint8/16/32 + bytes;<br />encoding is unspecified<ref>{{cite web|url=https://github.com/msgpack/msgpack|title=MessagePack is an extremely efficient object serialization library. It's like JSON, but very fast and small.: msgpack/msgpack|date=2 April 2019|via=GitHub}}</ref>
| ''or'' typecode (1 byte) + big-endian (u)int8/16/32/64
| As "fixarray" (single-byte prefix + up to 15 array items)
}}
or
| Typecode (1 byte) + IEEE single/double
typecode (one byte) + 2–4 bytes length + array items
| {{ubli
| As "fixmap" (single-byte prefix + up to 15 key-value pairs)
| Typecode + up to 15 bytes
or
| ''or'' typecode + length as uint8/16/32 + bytes;
typecode (one byte) + 2–4 bytes length + key-value pairs
}}
|-
encoding is unspecified<ref>{{cite web|url=https://github.com/msgpack/msgpack|title=MessagePack is an extremely efficient object serialization library. It's like JSON, but very fast and small.: msgpack/msgpack|date=2 April 2019|via=GitHub}}</ref>
| [[Netstring]]s{{efn |group=binary |Interpretation of Netstrings is entirely application- or schema-dependent}}
| {{ubli
| Not in protocol.
| As "fixarray" (single-byte prefix + up to 15 array items)
| Not in protocol.
| ''or'' typecode (1 byte) + 2–4 bytes length + array items
| Not in protocol.
}}
| Length encoded as an ASCII string + ':' + data + ','<br>
| {{ubli
| As "fixmap" (single-byte prefix + up to 15 key-value pairs)
| ''or'' typecode (1 byte) + 2–4 bytes length + key-value pairs
}}
|- style="vertical-align:top;"
| [[Netstring]]s{{efn |group=binary |Interpretation of Netstrings is entirely application- or schema-dependent.}}
| {{No|Not in protocol.}}
| {{No|Not in protocol.}}
| {{No|Not in protocol.}}
| {{No|Not in protocol.}}
| Length-encoded as an ASCII string + ':' + data + ','<br>
Length counts only octets between ':' and ','
| {{No|Not in protocol.}}
| {{No|Not in protocol.}}
|- style="vertical-align:top;"
| Not in protocol.
|-
| [[OGDL]] Binary
|
Line 1,031 ⟶ 1,037:
|
|
|- style="vertical-align:top;"
|-
| [[Property list]]<br>(binary format)
|
Line 1,040 ⟶ 1,046:
|
|
|- style="vertical-align:top;"
|-
| [[Protocol Buffers]]
|
|
| {{ubli
| Variable encoding length signed 32-bit: varint encoding of "ZigZag"-encoded value <code>(n << 1) [[XOR]] (n >> 31)</code>
| Variable encoding length signed 6432-bit: varint encoding of "ZigZag"-encoded value <code>(n << 1) [[XOR]] (n >> 6331)</code><br>
Constant | Variable encoding length 32signed 64-bit: 32varint bitsencoding inof [[little"ZigZag"-endian]]encoded <code>(n << 1) XOR (n [[2's>> complement]]63)<br/code>
| Constant encoding length 6432-bit: 6432 bits in [[little-endian]] [[2's complement]]
|[[Single precisionConstant floating-pointencoding format|floats]]length 64-bit: 64 bits in [[little-endian]] [[binary322's complement]]
}}
[[Double precision floating-point format|doubles]]: [[little-endian]] [[binary64]]
| {{ubli
| [[UTF-8]] encoded, preceded by varint-encoded integer length of string in bytes
| [[Single-precision floating-point format|Floats]]: [[little-endian]] [[binary32]]
| Repeated value with the same tag
| [[Double-precision floating-point format|Doubles]]: [[little-endian]] [[binary64]]
or, for varint-encoded integers only, values packed contiguously and prefixed by tag and total byte length
}}
| [[UTF-8]]-encoded, preceded by varint-encoded integer length of string in bytes
| Repeated value with the same tag or, for varint-encoded integers only, values packed contiguously and prefixed by tag and total byte length
| {{n/a}}
|- style="vertical-align:top;"
|-
| [[Ethereum|Recursive Length Prefix]]
| Not in protocol.<br>
<code>\x80</code> (zero-length string) often used
| Not in protocol.<br>Integer 0/1 often used.
| 0 - 127: <code>\x00</code> - <code>\x7f</code><br>
Other values: Strings of big-endian encoded bytes, of [[Bignum|arbitrary length]], beginning with <code>\x80</code> - <code>\xbf</code>
| Integer encodings may be interpreted as IEEE float.
| Length prefixed, up to 55 bytes: <code>\x80</code> - <code>\xb7</code> followed by data.<br>
56+ bytes: <code>\xb8</code> - <code>\xbf</code> followed by 1-8 byte integer length of string followed by data.
| Length prefixed, up to 55 bytes: <code>\xc0</code> - <code>\xf7</code> followed by data.<br>
56+ bytes: <code>\xf8</code> - <code>\xff</code> followed by 1-8 byte integer length of data followed by data.<br>
Length is always in bytes, not in list items.
| Not in protocol. May be encoded as lists of key/value pair lists or other formats.
|-
| [[Smile (data interchange format)|Smile]]
| <code>\x21</code>
| {{ubli
| True: <code>\x23</code><br>False: <code>\x22</code>
| True: <code>\x23</code>
| Single byte "small" (values -16..15 encoded using <code>\xc0</code> - <code>\xdf</code>),
| False: <code>\x22</code>
zigzag-encoded <code>varint</code>s (1–11 data bytes), or <code>BigInteger</code>
}}
| {{ubli
| Single byte "small" (values {{nowrap|−16 – 15}} encoded as {{nowrap|<code>\xc0</code>–<code>\xdf</code>}}),
| zigzag-encoded <code>varint</code>s (1–11 data bytes), or <code>BigInteger</code>
}}
| IEEE single/double, <code>BigDecimal</code>
| Length-prefixed "short" Strings (up to 64 bytes), marker-terminated "long" Strings and (optional) back-references
| Arbitrary-length heterogenous arrays with end-marker
| Arbitrary-length key/value pairs with end-marker
|- style="vertical-align:top;"
|-
| [[SDXF|Structured Data eXchange Formats]] (SDXF)
|
|
| bigBig-endian signed 24-bit or 32-bit integer
| bigBig-endian IEEE double
| eitherEither [[UTF-8]] or ISO 8859-1 encoded
| listList of elements with identical ID and size, preceded by array header with int16 length
| chunksChunks can contain other chunks to arbitrary depth.
|- style="vertical-align:top;"
|-
| [[Thrift (protocol)|Thrift]]
|
Line 1,097:
|
|}
 
{{sticky table end}}
{{notelist|group=binary}}
 
==See also==
*[[Comparison of document- markup languages]]
 
==References==
Line 1,111 ⟶ 1,113:
[[Category:Data serialization formats]]
[[Category:Persistence]]
[[Category:Computing comparisons|Data-serialization formats]]