Golomb coding: Difference between revisions

Content deleted Content added
Applications: Changed hyperlink to a more easily accessible reference.
 
(6 intermediate revisions by 6 users not shown)
Line 1:
{{distinguish|Exponential-Golomb coding}}
{{Short description|Lossless data compression method}}
{{More citations needed|section|date=October 2024|talk=Many sections unsourced}}
'''Golomb coding''' is a [[lossless data compression]] method using a family of [[data compression]] codes invented by [[Solomon W.&nbsp;Golomb]] in the 1960s. Alphabets following a [[geometric distribution]] will have a Golomb code as an optimal [[prefix code]],<ref>{{Cite journal | last1 = Gallager | first1 = R. G. |last2 = van Voorhis |first2 = D. C.| title = Optimal source codes for geometrically distributed integer alphabets | journal = [[IEEE Transactions on Information Theory]]| volume = 21 | issue = 2 | pages = 228–230 | year = 1975 | doi=10.1109/tit.1975.1055357}}</ref> making Golomb coding highly suitable for situations in which the occurrence of small values in the input stream is significantly more likely than large values.
 
Line 17 ⟶ 19:
Golomb coding uses a tunable parameter {{mvar|M}} to divide an input value {{mvar|x}} into two parts: {{mvar|q}}, the result of a division by {{mvar|M}}, and {{mvar|r}}, the remainder. The quotient is sent in [[unary coding]], followed by the remainder in [[truncated binary encoding]]. When <math>M=1</math>, Golomb coding is equivalent to unary coding.
 
Golomb–Rice codes can be thought of as codes that indicate a number by the position of the ''bin'' ({{mvar|q}}), and the ''offset'' within the ''bin'' ({{mvar|r}}). The example figure shows the position {{mvar|q}} and offset {{mvar|r}} for the encoding of integer {{mvar|x}} using Golomb–Rice parameter {{math|''M'' {{=}} 3}}, with source probabilities following a geometric distribution with {{math|''p''(0) {{=}} 0.2}}.
 
Formally, the two parts are given by the following expression, where {{mvar|x}} is the nonnegative integer being encoded:
 
:<math display="block">q = \left \lfloor \frac{x}{M} \right \rfloor</math>
 
and
 
:<math display="block">r = x - qM.</math>.
 
[[File:GolombCodeRedundancy.svg|thumb|upright 1.5|This image shows the redundancy, in bits, of the Golomb code, when {{mvar|M}} is chosen optimally, for {{math| 1 − ''p''(0) &ge; 0.45}}]]
Line 32 ⟶ 34:
 
The integer {{mvar|x}} treated by Golomb was the run length of a [[Bernoulli process]], which has a [[geometric distribution]] starting at 0. The best choice of parameter {{mvar|M}} is a function of the corresponding Bernoulli process, which is parameterized by <math>p = P(x=0)</math> the probability of success in a given [[Bernoulli trial]]. {{mvar|M}} is either the median of the distribution or the median ±1. It can be determined by these inequalities:
: <math display="block">(1-p)^M + (1-p)^{M+1} \leq 1 < (1-p)^{M-1} + (1-p)^M,</math>
which are solved by
: <math display="block">M = \left\lceil -\frac{\log(2 -p)}{\log(1-p)}\right\rceil.</math>.
 
For the example with {{math|''p''(0) {{=}} 0.2}}:
: <math display="block">M = \left\lceil -\frac{\log(1.8)}{\log(0.8)}\right\rceil = \left\lceil 2.634 \right\rceil = 3.</math>.
 
The Golomb code for this distribution is equivalent to the [[Huffman code]] for the same probabilities, if it were possible to compute the Huffman code for the infinite set of source values.
Line 67 ⟶ 69:
# Skip the 0 delimiter
# Let <math>b = \lfloor\log_2(M)\rfloor</math>
## Interpret next ''b'' bits as a binary number ''r'''. If <math>r' < 2^{b+1}-M</math> holds, then the reminderremainder <math> r = r' </math>
## Otherwise interpret ''b + 1'' bits as a binary number ''r''', the reminderremainder is given by <math>r = r' - 2^{b+1} + M</math>
# Compute <math>N = q * M + r</math>
 
Line 141 ⟶ 143:
Consider using a Rice code with a binary portion having {{mvar|b}} bits to run-length encode sequences where ''P'' has a probability {{mvar|p}}. If <math>\mathbb{P}[\text{bit is part of }k\text{-run}]</math> is the probability that a bit will be part of an {{mvar|k}}-bit run (<math>k-1</math> ''P''s and one ''Q'') and <math>(\text{compression ratio of }k\text{-run})</math> is the compression ratio of that run, then the expected compression ratio is
<!-- below mostly comes from above reference (Kiely), but not exactly, so leave uncited for now -->
:<math display="block">\begin{align}
\mathbb{E}[\text{compression ratio}]
&= \sum_{k=1}^\infty (\text{compression ratio of }k\text{-run}) \cdot \mathbb{P}[\text{bit is part of }k\text{-run}] \\
Line 178 ⟶ 180:
* [[Elias delta coding]]
* [[Variable-length code]]
* [[Exponential-Golomb coding]]
 
== References ==
Line 196 ⟶ 199:
{{DEFAULTSORT:Golomb Coding}}
[[Category:Entropy coding]]
[[Category:Data compression]]