Content deleted Content added
→Adaptive run-length Golomb–Rice encoding: Changed hyperlink to a more easily available reference. |
|||
(9 intermediate revisions by 7 users not shown) | |||
Line 1:
{{distinguish|Exponential-Golomb coding}}
{{Short description|Lossless data compression method}}
{{More citations needed|section|date=October 2024|talk=Many sections unsourced}}
'''Golomb coding''' is a [[lossless data compression]] method using a family of [[data compression]] codes invented by [[Solomon W. Golomb]] in the 1960s. Alphabets following a [[geometric distribution]] will have a Golomb code as an optimal [[prefix code]],<ref>{{Cite journal | last1 = Gallager | first1 = R. G. |last2 = van Voorhis |first2 = D. C.| title = Optimal source codes for geometrically distributed integer alphabets | journal = [[IEEE Transactions on Information Theory]]| volume = 21 | issue = 2 | pages = 228–230 | year = 1975 | doi=10.1109/tit.1975.1055357}}</ref> making Golomb coding highly suitable for situations in which the occurrence of small values in the input stream is significantly more likely than large values.
Line 17 ⟶ 19:
Golomb coding uses a tunable parameter {{mvar|M}} to divide an input value {{mvar|x}} into two parts: {{mvar|q}}, the result of a division by {{mvar|M}}, and {{mvar|r}}, the remainder. The quotient is sent in [[unary coding]], followed by the remainder in [[truncated binary encoding]]. When <math>M=1</math>, Golomb coding is equivalent to unary coding.
Golomb–Rice codes can be thought of as codes that indicate a number by the position of the ''bin'' ({{mvar|q}}), and the ''offset'' within the ''bin'' ({{mvar|r}}). The example figure shows the position {{mvar|q}} and offset {{mvar|r}} for the encoding of integer {{mvar|x}} using Golomb–Rice parameter {{math|''M'' {{=}} 3}}, with source probabilities following a geometric distribution with {{math|''p''(0) {{=}} 0.2}}.
Formally, the two parts are given by the following expression, where {{mvar|x}} is the nonnegative integer being encoded:
and
[[File:GolombCodeRedundancy.svg|thumb|upright 1.5|This image shows the redundancy, in bits, of the Golomb code, when {{mvar|M}} is chosen optimally, for {{math| 1 − ''p''(0) ≥ 0.45}}]]
Line 32 ⟶ 34:
The integer {{mvar|x}} treated by Golomb was the run length of a [[Bernoulli process]], which has a [[geometric distribution]] starting at 0. The best choice of parameter {{mvar|M}} is a function of the corresponding Bernoulli process, which is parameterized by <math>p = P(x=0)</math> the probability of success in a given [[Bernoulli trial]]. {{mvar|M}} is either the median of the distribution or the median ±1. It can be determined by these inequalities:
which are solved by
For the example with {{math|''p''(0) {{=}} 0.2}}:
The Golomb code for this distribution is equivalent to the [[Huffman code]] for the same probabilities, if it were possible to compute the Huffman code for the infinite set of source values.
Line 67 ⟶ 69:
# Skip the 0 delimiter
# Let <math>b = \lfloor\log_2(M)\rfloor</math>
## Interpret next ''b'' bits as a binary number ''r'''. If <math>r' < 2^{b+1}-M</math> holds, then the
## Otherwise interpret ''b + 1'' bits as a binary number ''r''', the
# Compute <math>N = q * M + r</math>
Line 141 ⟶ 143:
Consider using a Rice code with a binary portion having {{mvar|b}} bits to run-length encode sequences where ''P'' has a probability {{mvar|p}}. If <math>\mathbb{P}[\text{bit is part of }k\text{-run}]</math> is the probability that a bit will be part of an {{mvar|k}}-bit run (<math>k-1</math> ''P''s and one ''Q'') and <math>(\text{compression ratio of }k\text{-run})</math> is the compression ratio of that run, then the expected compression ratio is
<!-- below mostly comes from above reference (Kiely), but not exactly, so leave uncited for now -->
\mathbb{E}[\text{compression ratio}]
&= \sum_{k=1}^\infty (\text{compression ratio of }k\text{-run}) \cdot \mathbb{P}[\text{bit is part of }k\text{-run}] \\
Line 157 ⟶ 159:
When a probability distribution for integers is not known, the optimal parameter for a Golomb–Rice encoder cannot be determined. Thus, in many applications, a two-pass approach is used: first, the block of data is scanned to estimate a probability density function (PDF) for the data. The Golomb–Rice parameter is then determined from that estimated PDF. A simpler variation of that approach is to assume that the PDF belongs to a parametrized family, estimate the PDF parameters from the data, and then compute the optimal Golomb–Rice parameter. That is the approach used in most of the applications discussed below.
An alternative approach to efficiently encode integer data whose PDF is not known, or is varying, is to use a backwards-adaptive encoder. The RLGR encoder [https://www.microsoft.com/en-us/research/publication/adaptive-run-length-golomb-rice-encoding-of-quantized-generalized-gaussian-sources-with-unknown-statistics/]
== Applications ==
Line 173 ⟶ 175:
The [[Lossless JPEG#JPEG-LS|JPEG-LS]] scheme uses Rice–Golomb to encode the prediction residuals.
The adaptive version of Golomb–Rice coding mentioned above, the RLGR encoder [https://www.
==See also==
* [[Elias delta coding]]
* [[Variable-length code]]
* [[Exponential-Golomb coding]]
== References ==
Line 196 ⟶ 199:
{{DEFAULTSORT:Golomb Coding}}
[[Category:Entropy coding]]
[[Category:Data compression]]
|