Talk:Comparison of data-serialization formats: Difference between revisions

Content deleted Content added
Zzo38 (talk | contribs)
No edit summary
 
(33 intermediate revisions by 16 users not shown)
Line 1:
{{oldafdfull|page=Criticism of XML|date=6 August 2009|result='''Move''' to [[Comparison of data serialization formats]]}}
{{WikiProject banner shell|class=List|
{{WikiProject Computing }}
}}
 
==Reason for this article==
Line 19 ⟶ 22:
 
I second the inclusion of XDR [[User:Jann.poppinga|Jann.poppinga]] ([[User talk:Jann.poppinga|talk]]) 10:57, 19 March 2010 (UTC)
 
Shouldn't Boost Serialization be included here? <!-- Template:Unsigned IP --><small class="autosigned">—&nbsp;Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/98.171.183.235|98.171.183.235]] ([[User talk:98.171.183.235#top|talk]]) 00:54, 25 October 2022 (UTC)</small> <!--Autosigned by SineBot-->
 
== This section is in the wrong place ==
Line 77 ⟶ 82:
}} Proposes an alternative system for encoding overlapping elements. </ref>. However, SOAP encoding demonstrates the ease by which graphs are serializable using proper ID and IDREF usage.
* Transformations, even identity transforms, result in changes to format (whitespace, attribute ordering, attribute quoting, whitespace around attributes, newlines). These problems can make [[diff]]-ing the XML source very difficult except via [[Canonical XML]].
* Unlike JSON or YAML, XML does not map directly and unambiguously to an associative array. <!-- Template:Unsigned --><span class="autosigned" style="font-size:85%;">—&nbsp;Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[User:Cowlinator|Cowlinator]] ([[User talk:Cowlinator#top|talk]] • [[Special:Contributions/Cowlinator|contribs]]) 16:00, 16 February 2021 (UTC)</span> <!--Autosigned by SineBot-->
 
{{reflist-talk}}
Line 82 ⟶ 88:
===Human Readable?===
XML should only be tagged as partially human-readable, the simpler XML files, basic XML files can be, but onces xmlns and xsd come into play, it quickly becomes not human-readable. Another factor is that it's not always possible to properly reformat/indent XML for readability without affecting content. [[Special:Contributions/81.220.246.44|81.220.246.44]] ([[User talk:81.220.246.44|talk]]) 14:20, 24 October 2014 (UTC)
 
===Quite Human Readable!===
Concerning the "not human-readable" implements mentioned above, XML namespace specification attributes ("xmlns") and XML schema definitions (XSDs) are text just like XML, and perfectly human-readable. And you absolutely can reformat/indent XML w/out affecting content; that/explicit value delimitation via tag/attrib is the whole point/benefit over whitespace-delimited encoders like YAML (1 detriment of which '''is''' the negative effect of improper/varied indentation). <!-- Template:Unsigned IP --><small class="autosigned">—&nbsp;Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/192.91.171.42|192.91.171.42]] ([[User talk:192.91.171.42#top|talk]]) 17:11, 17 January 2020 (UTC)</small> <!--Autosigned by SineBot-->
 
== JSON Associative Array Error ==
Line 116 ⟶ 125:
== Missing other Protocol Buffers flavors ==
To be complete, FlatBuffers (http://google.github.io/flatbuffers/) and Cap'N Proto (https://capnproto.org/) could be mentiond. [[Special:Contributions/128.237.28.16|128.237.28.16]] ([[User talk:128.237.28.16|talk]]) 16:00, 24 February 2015 (UTC)
 
Agree. [[Cap'n Proto]] is very interesting and would be a good comparison. I do not have enough in-depth knowledge to create an official entry. [[User:CaliViking|CaliViking]] ([[User talk:CaliViking|talk]]) 18:33, 29 September 2022 (UTC)
 
== Misleading "Standardized?" Column ==
Line 137 ⟶ 148:
https://github.com/edn-format/edn <small class="autosigned">—&nbsp;Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/164.144.252.29|164.144.252.29]] ([[User talk:164.144.252.29|talk]]) 18:56, 11 November 2015 (UTC)</small><!-- Template:Unsigned IP --> <!--Autosigned by SineBot-->
:EDN seems to have an article here: [[Extensible Data Notation]]. [[Special:Contributions/50.53.1.21|50.53.1.21]] ([[User talk:50.53.1.21|talk]]) 22:04, 29 October 2017 (UTC)
 
== Missing [[Cap'n Proto]] ==
Looks very interesting. I don't have the in-depth knowledge to write a good entry. Creator of the format is very knowledgeable in the field as the primary author of Proto Buffers version 2.
See also https://capnproto.org/ , https://github.com/capnproto/ , https://groups.google.com/g/capnproto
[[User:CaliViking|CaliViking]] ([[User talk:CaliViking|talk]]) 18:33, 29 September 2022 (UTC)
 
== External links modified ==
Line 175 ⟶ 191:
:::::Continuing the Python comparison (which also answers your deleted comment): Including Hjson in a [[serialization]] format discussion, is like including Python instead of Pickle. [[Special:Contributions/185.213.154.172|185.213.154.172]] ([[User talk:185.213.154.172|talk]]) 21:32, 13 August 2019 (UTC)
:::::By YAML being a standard, I was referring to how exceedingly common it is. I.e. a de facto standard. [[Special:Contributions/185.213.154.172|185.213.154.172]] ([[User talk:185.213.154.172|talk]]) 21:34, 13 August 2019 (UTC)
:::::I agree with MrOllie above. I would like to add that when an article for Hjson exists, and the [[configuration file]] format is shown to be noteworthy, it seems like it would best fit in the "See also" section of the JSON article, under "Related formats". [[Special:Contributions/185.213.154.172|185.213.154.172]] ([[User talk:185.213.154.172|talk]]) 22:20, 13 August 2019 (UTC)
 
== Additional and upcoming formats possibly not worthy of mentioning in the main article, but shall be gathered anyway ==
 
Okay, there are ''a lot'' of formats out there. I open this section to collect and reference them, and if they reach the notability threshold they can be included in the main article. I'm sure there's plenty.
Also there is some discrepancy/overlapping between ''data-serialization formats'', ''[[data exchange]] formats'' and ''[[configuration file]] formats'', I do not make distinction here since - in my opinion - they are mostly the same set with very similar purposes and only slightly specificattributes. --[[user:grin|grin]] [[user talk:grin|✎]] 09:39, 20 February 2020 (UTC)
:{{reply to|grin}} TOML isn't a [[serialization]] format used as a [[configuration file]] format ([[Configuration_file#Serialization_formats|as often happens]]), it's explicitly designed as a configuration file format. To quote the [https://github.com/toml-lang/toml official objectives]: "TOML aims to be a minimal configuration file format that's easy to read due to obvious semantics."
:The same goes for INI files, and HOCON (it's even in the name: Human-Optimized '''Config''' Object Notation). To quote the HOCON [https://github.com/lightbend/config/blob/master/HOCON.md informal specification]: "The primary goal is: keep the semantics (tree structure; set of types; encoding/escaping) from JSON, but make it more convenient as a human-editable config file format.". Thus everything that HOCON can serialize, JSON can serialize. HOCON is entirely superfluous when it comes to serialization (which this article is about).
:This article is crowded as is. If someone wants a table comparing configuration file formats, then why not create the article [[Comparison of configuration file formats]]? Then this article could link to that, and that article could link to this. Nothing's stopping that, right?
:[[Special:Contributions/193.138.218.217|193.138.218.217]] ([[User talk:193.138.218.217|talk]]) 11:29, 20 June 2020 (UTC)
::Obviously not right since I guess it would be deleted within the first few minutes, with the justification "there's already data-serialisation format article, insert it there", or at least this have a pretty high probability of happening. But it's not my child, do as you please. *shrug* --[[user:grin|grin]] [[user talk:grin|✎]] 14:12, 21 June 2020 (UTC)
:: @193.138.218.217, I'm not sure why you are talking about serialization formats and configuration file formats as if they are mutually exclusive. Anyone can use a configuration file format for serialization, and anyone can use a serialization format as a configuration file. The intended usage is irrelevant. All of the below formats should be added to the article. ----[[User:Cowlinator|Cowlinator]] ([[User talk:Cowlinator|talk]]) 15:43, 16 February 2021 (UTC)
 
==== Human readable ====
* [[sdlang]] - https://sdlang.org/ -
* [[TOML]] - already listed as DX format
* [[StrictYAML]] - [https://github.com/crdoconnor/strictyaml type-safe YAML] that parses and validates a restricted subset of the YAML specification
* [[INI file]] - already listed as CFG format
* [[HJSON]] - mentioned as JSON CFG format (also [[JSON5]])
* [[CSON]] - Coffeescript JSON; JSON without the braces
* [[HOCON]] - CFG format popular among Scala projects. It is a superset of JSON.
 
==== Binary ====
 
* [[UAVCAN]] -- a pub/sub protocol that defines an interface definition language (DSDL)
 
== In what way is YAML not standardized? ==
 
In what way is YAML not standardized? ----[[User:Cowlinator|Cowlinator]] ([[User talk:Cowlinator|talk]]) 16:02, 16 February 2021 (UTC)
 
== Other data serialization formats ==
 
Hi,
 
Other media types like images, audio and video are data too. The main difference might be that they usually are binary encoded, but that's okay since there are plenty of binary data formats already in the article [[Comparison of data-serialization formats]].
 
I suggest referring to [[Media type]]s as they're standardized.
 
Have a nice day :) [[User:Dun Nic|Dun Nic]] ([[User talk:Dun Nic|talk]]) 19:22, 14 October 2022 (UTC)
 
== Additional characteristics for comparison ==
 
Hi, I' missing a key characteristic (at least, it's key to me).
 
Lacking a better name for it, we can refer to it as '''streaming'''. A data serialization format supporting streaming would mean that it supports a '''unlimited''' amount of '''items in one''' data '''stream'''.
 
For instance:
 
* [[CSV]] supports one item per line.
* [[Log file]]s support one item per line.
* [[YAML]] supports multiple "documents".
* [[Multipart/form-data]] supports many "parts" and even of different MIME types.
* [[JSON]] on the other hand, does not allow concatenation of multiple documents.
* The same goes for [[XML]], it demands only one root node (closed).
 
Have a nice day :) [[User:Dun Nic|Dun Nic]] ([[User talk:Dun Nic|talk]]) 19:39, 14 October 2022 (UTC)
 
:Yes, we should add a stramable field. It can be argued though that JSON is partially streamable, as there is no rule against sending multiple objects in one document.
:I propose:
:- Streambale: Yes. This means it is explictly designed for streaming or live-appendation, such as CSV and log files.
:- Streamable: Somewhat. This means it was not designed for it but it has methods to stream it. (Such as making a file/stream an implicit array of JSON objects).
:- Streamable: No. Formats which cannot be streamed, such as XML it would inherntly violate the structure. [[User:Tryoxiss|Tryoxiss]] ([[User talk:Tryoxiss|talk]]) 00:59, 7 December 2023 (UTC)
 
== Missing RDF ==
 
[[RDF]] seems legit to me :) [[User:Dun Nic|Dun Nic]] ([[User talk:Dun Nic|talk]]) 19:39, 14 October 2022 (UTC)
 
== PostScript binary format ==
 
There is also the [[PostScript]] binary format. It has the advantage that you might not need to parse all of the data to find something; each part contains the address of the sub-parts. However, it also has disadvantages such as lack of 64-bit integers, and strings cannot exceed 64K. --[[User:Zzo38|Zzo38]] ([[User talk:Zzo38|talk]]) 01:34, 6 April 2024 (UTC)
 
:Postscript is a programming language, not a serialization format. [[Special:Contributions/83.84.234.140|83.84.234.140]] ([[User talk:83.84.234.140|talk]]) 10:56, 18 May 2024 (UTC)
 
::This is true, but there is the PostScript binary serialization format, which is a subset of PostScript like JSON is a subset of JavaScript. (The PostScript binary serialization is a perfectly valid PostScript code. PostScript has the unusual feature of being a programming language that can be written in text but also has binary forms for many tokens and allows them to be mixed together in the same program.) --[[User:Zzo38|Zzo38]] ([[User talk:Zzo38|talk]]) 05:49, 31 December 2024 (UTC)
 
== Very difficult to read ==
 
I'm viewing this on an average-sized monitor using Chrome and it's very very difficult to "read" these tables as I can only see half of the table. I have to scroll down to the bottom of the table, where I find a horizontal scroll bar, move it along, then go up but then I can't see what on earth I'm looking at because I've lost the LHS. It's basically unviewable. What madness is this? [[Special:Contributions/2A02:C7C:5C4A:6900:18D1:DEE:D5D5:4EB1|2A02:C7C:5C4A:6900:18D1:DEE:D5D5:4EB1]] ([[User talk:2A02:C7C:5C4A:6900:18D1:DEE:D5D5:4EB1|talk]]) 12:10, 26 August 2024 (UTC)
 
== Data type comparison ==
 
I think it would be worth to compare what data types are available. Some such types are:
* Null
* Boolean
* Integers: Some will limit the number of bits and some have no maximum.
* Floating point
* Bit strings
* Byte strings
* Character strings: There is also consideration of character sets. For example, some formats are limited to Unicode, while others allow other character sets. (Some formats have no character string type separate from the byte string type.)
* Date/time types
* Sequence
* Key/value list
* Unordered set or multiset
* Object identifier
* References to other nodes
* User-defined types
* Other types which do not match the above
--[[User:Zzo38|Zzo38]] ([[User talk:Zzo38|talk]]) 05:47, 31 December 2024 (UTC)