Delta encoding: Difference between revisions

Content deleted Content added
Undid revision 1115426231 by 67.162.198.56 (talk)
 
(22 intermediate revisions by 15 users not shown)
Line 1:
{{Short description|Type of data transmission method}}
{{Distinguish|Elias delta coding|delta modulation}}
 
Line 5 ⟶ 6:
The differences are recorded in discrete files called "deltas" or "diffs". In situations where differences are small – for example, the change of a few words in a large document or the change of a few records in a large table – delta encoding greatly reduces data redundancy. Collections of unique deltas are substantially more space-efficient than their non-encoded equivalents.
 
From a logical point of view, the difference between two data values is the information required to obtain one value from the other – see [[relative entropy]]. The difference between identical values (under some [[equivalence relation|equivalence]]) is often called ''0'' or the neutral element.
 
==Simple example==
Perhaps the simplest example is storing values of bytes as differences (deltas) between sequential values, rather than the values themselves. So, instead of 2, 4, 6, 9, 7, we would store 2, 2, 2, 3, −2. This reduces the [[variance]] (range) of the values when neighbor samples are correlated, enabling a lower bit usage for the same data. [[Interchange File Format|IFF]] [[8SVX]] sound format applies this encoding to raw sound data before applying compression to it. Unfortunately, notNot even all 8-bit sound [[sampling (signal processing)|samples]] compress better when delta encoded, and the usability of delta encoding is even smaller for 16-bit and better samples. Therefore, compression algorithms often choose to delta encode only when the compression is better than without. However, in [[video compression]], delta frames can considerably reduce frame size and are used in virtually every video compression [[codec]].
 
==Definition==
A delta can be defined in 2 ways, ''symmetric delta'' and ''directed delta''. A ''symmetric delta'' can be expressed as
: <math>\Delta(v_1version_1, v_2version_2) = (v_1version_1 \setminus v_2version_2) \cup (v_2version_2 \setminus v_1version_1),</math>
where <math>v_1</math> and <math>v_2</math> represent two versions.
 
A ''directed delta'', also called a change, is a sequence of (elementary) change operations which, when applied to one version <math>v_1version_1</math>, yields another version, <math>v_2version_2</math> (note the correspondence to [[transaction log]]s in databases). In computer implementations, they typically take the form of a language with two commands: ''copy data from v1<math>version_1</math>'' and ''write literal data''.
 
===Variants===
Line 55:
 
==Examples==
===Binary delta compression===
{{excerpt|Binary delta compression}}
 
===Delta encoding in HTTP===
Line 69 ⟶ 71:
 
=== Delta copying ===
''Delta copying'' is a fast way of copying a file that is partially changed, when a previous version is present on the destination ___location. With delta copying, only the changed part of a file is copied. It is usually used in [[backup]] or [[file copying]] software, often to save [[Bandwidth (computing)|bandwidth]] when copying between computers over a private network or the internet. One notable open-source example is [[rsync]].<ref>{{Cite web |url=http://www.2brightsparks.com/bb/viewtopic.php?t=4473 |title=Feature request: Delta copying - 2BrightSparks |access-date=2016-04-29 |archive-date=2016-03-13 |archive-url=https://web.archive.org/web/20160313021725/http://www.2brightsparks.com/bb/viewtopic.php?t=4473 |url-status=dead }}</ref><ref>{{Cite web|url=https://www.bvckup2.com/support/forum/topic/739|title = Bvckup 2 &#124; Forum &#124; How delta copying works}}</ref><ref>[http://www.eggheadcafe.com/software/aspnet/33678264/delta-copying.aspx]{{Dead link|date=July 2019 |bot=InternetArchiveBot |fix-attempted=yes }}</ref>
 
==== Online backup ====
Line 86 ⟶ 88:
{{main|Git (software)}}
 
The Git source code control system employs delta compression in an auxiliary "[http://git-scm.com/docs/git-repack git repack]" operation. Objects in the repository that have not yet been delta-compressed ("loose objects") are compared against a heuristically chosen subset of all other objects, and the common data and differences are concatenated into a "pack file" which is then compressed using conventional methods. In common use cases, where source or data files are changed incrementally between commits, this can result in significant space savings. The repack operation is typically performed as part of the "[http git gc"<ref>{{Cite web|url=https://git-scm.com/docs/git-gc|title=Git git- git-gc]" Documentation|website=git-scm.com|accessdate=November 9, 2024}}</ref> process, which is triggered automatically when the numbers of loose objects or pack files exceed configured thresholds.
 
The format is documented in the pack-format page of the Git documentation. It implements a directed delta.<ref>{{cite web |title=Git - pack-format Documentation |url=https://git-scm.com/docs/pack-format |website=Git documentation |access-date=13 January 2020}}</ref>
Line 95 ⟶ 97:
 
===GDIFF===
Generic Diff Format (GDIFF) is another directed delta encoding format. It was submitted to [[W3C]] in 1997.<ref>[http{{Cite web|url=https://www.w3.org/TR/NOTE-gdiff-19970901 .html|title=Generic Diff Format Specification]|website=www.w3.org}}</ref> In many cases, VCDIFF has better compression rate than GDIFF.
 
===bsdiff===
Bsdiff is a binary diff program using [[Suffix array|suffix sorting]]. For executables that contain many changes in pointer addresses, it performs better than VCDIFF-type "copy and literal" encodings. The intent is to find a way to generate a small diff without needing to parse assembly code (as in Google's Courgette). Bsdiff achieveachieves this by allowing "copy" matches with errors, which are then corrected using an extra "add" array of bytewise differences. Since this array is mostly either zero or repeated values for offset changes, it takes up little space after compression.<ref>Colin Percival, Naive differences of executable code,{{Cite web|url=http://www.daemonology.net/bsdiff/|title=Binary diff|website=www.daemonology.net|accessdate=November 9, 2003.2024}}</ref>
 
Bsdiff is useful for delta updates. Google uses bsdiff in Chromium and Android. The ''deltarpm'' feature of the [[RPM Package Manager]] is based on a heavily- modified bsdiff that can use a hash table for matching.<ref>{{cite web |title=rpmdelta/delta.c |url=https://github.com/rpm-software-management/deltarpm/blob/c5e0ca7482e2cfea5e4d902ffe488e0a71ed3e67/delta.c |publisher=rpm-software-management |access-date=13 January 2020 |date=3 July 2019}}</ref> [[FreeBSD]] also uses bsdiff for updates.<ref>{{cite web|author=Anonymous|website=GitHub Gist|url=https://gist.github.com/anonymous/e48209b03f1dd9625a992717e7b89c4f|title=NON-CRYPTANALYTIC ATTACKS AGAINST FREEBSD UPDATE COMPONENTS|date=May 2016}}</ref>
 
Since the 4.3 release of bsdiff in 2005, various improvements or fixes have been produced for it. Google maintains multiple versions of the code for each of its products.<ref>{{cite web |title=xtraeme/bsdiff-chromium: README.chromium |url=https://github.com/xtraeme/bsdiff-chromium/blob/master/README.chromium |website=GitHub |language=en |date=2012}}; {{cite web |title=courgette/third_party/bsdiff/README.chromium - chromium/src |url=https://chromium.googlesource.com/chromium/src/+/master/courgette/third_party/bsdiff/README.chromium |website=Git at Google}}; {{cite web |title= android/platform/external/bsdiff/|url=https://android.googlesource.com/platform/external/bsdiff/+/refs/heads/master|website=Git at Google}}</ref> FreeBSD takes many of Google's compatible changes, mainly a vulnerability fix and a switch to the faster {{code|divsufsort}} suffix-sorting routine.<ref>{{cite web|url=https://github.com/freebsd/freebsd/commits/master/usr.bin/bsdiff|website=GitHub|title=History for freebsd/usr.bin/bsdiff}}</ref> [[Debian]] has a series of performance tweaks to the program.<ref>{{cite web|url=https://sources.debian.org/patches/bsdiff/ |website=Debian Patch Tracker |title=Package: bsdiff}}</ref>
Line 107 ⟶ 109:
 
==See also==
 
* [[{{Annotated link |Data differencing]]}}
* [[{{Annotated link |Interleaved deltas]]}}
* [[{{Annotated link |Source Code Control System]]}}
* [[{{Annotated link |String-to-string correction problem]]}}
* [[Xdelta]]: open-source delta encoder
 
Line 117 ⟶ 120:
 
==External links==
* RFC{{IETF RFC|3229|link=no}} – Delta Encoding in HTTP
 
{{Compression methods}}
Line 126 ⟶ 129:
[[Category:Data differencing]]
[[Category:Articles with example C code]]
[[Category:Data compression]]